Last week, we continued on color spaces, and we showed—although handwavingly—that we’re not very good at seeing color differences in hue and saturation, and that we’re only good at seeing difference in brightness. In this last part, we’ll have a look at how JPEG exploits that for image compression.
The first thing JPEG does when compressing the image is usually (but not obligatorily) transform the RGB display-compatible image into the YCrCb colorspace. That, we have done last week. So basically, we have three images: one containing brightness (basically the black-and-white version of the image), one containing the “red difference” and another containing the “blue difference”, corresponding, more or less, to “color” (not unlike hue and saturation).
JPEG then allows you to specify how you reduce spatial resolution in the Cb and Cr planes. Typically, we use 4:2:0 subsampling which, for each group of 4 pixels in Y takes only one pixel in Cb and one in Cb. This is summarized in this figure (click to embiggen):
Note that JPEG doesn’t do any fancy image reconstruction when it decompresses the image. It only up-sample the decimated pixels by doubling them. That doesn’t mean you can’t down-sample them in a smarter way than merely taking a pixel (in Cr and Cb) every four. You can average them, sample them using a better filter (maybe one of the Blackman filters), or better yet, apply psychovisual filtering to produce the best-looking down-sampled images. As we showed last week, it’s not that worthwhile to do so because we simply wouldn’t notice.
Even without further compression, this subsampling scheme gives you a 2:1 compression ratio. Indeed, you go from 3 colors planes (RGB, then YCrCb) to one full Y and two quarter Cr and Cb, for a total of 1.5 planes (in terms of the original number of pixels times color components). This compression ratio comes in cheap computationally because crude subsampling isn’t very hard to compute, and also cheap in terms of image degradation.
After this colorspace transformation and then subsampling, comes the actual data compression. In JPEG, the data compression is lossy and based on the DCT, where the image is transformed into a “frequency” domain where it’s easy to figure out what information we can further destroy without affecting the quality of the image. Basically, the DCT allows us to isolate low-amplitude high-frequency components of the image—weak dithering effects—which are hypothesized to be unnoticeable, and simply destroy them. The reconstructed image lost this information, but it shouldn’t be visible. Of course, if you crank up compression, you will destroy more and more information, which means you will eventually not only erase the low-amplitude high-frequency components but also start to damage high-amplitude low-frequency components of the images. Then the effects of compression are quite visible:
This scheme (transforming the colorspace from RGB to a colorspace more aligned with visual perception, then subsampling in the space domain the “color” components, then further compression by destroying “invisible” information) is pretty much the dominant paradigm in image and video compression. The only difference between standards (JPEG, MPEG, H.264, etc.) is how the details are managed. Some standards will use a colorspace other than YCrCb, other will have other types of subsampling, or other types of space-to-frequency transforms, other still will use a different structure for the entropy coding. The bottom line is, all standards try to take advantage that we’re not very good at seeing hue/saturation differences, only at seeing brightness variations.