## Kodak 1 (Colorspaces I)

Images (and, by extension, video) compression depends strongly on the color space used. A color space is a (hopefully convenient) representation of colors. We’re all familiar with the RGB color space used by video hardware. The RGB color space is based on three standard primary colors: red, green, and blue. Combining these three colors, each with varying intensity, yields a wide gamut of perceived colors.

However, if RGB is intuitive, it is not a very good color space for compression because it doesn’t exploit any of our perceptual quirks. We’re very good at distinguishing small variation in brightness, but not so much in tint or saturation. Compression schemes need to exploit this in order to destroy information (and obtaining better compression). This is why almost all image and video compression algorithm out there use a different color space, one that represents color as brightness, and two (or more!) components that are more or less related to tint and saturation—or some other measure of difference of color.

The colors spaces come in two families:

• Linear color spaces. As their name implies, they are linear spaces, or more precisely vector spaces. To convert from, and to, RGB, all we need is a linear transformation which usually materializes as a matrix-vector product.
• Non linear color spaces. These color spaces uses arbitrary transforms such as mapping onto a circle—we’ll see some of those later. They are usually more complicated and costly to compute, but they may be closer to our own perception of color, or, more accurately, our perception of the differences in color.

For performance purposes—real or perceived—compression schemes prefer linear transforms.

*
* *

Usually, for image and video compression, we’re interested in lossy compression and we do not really care if some bits are lost in the color space transformation. JPEG, for example, transforms the image from RGB to another color space, YCrCb, where the Cr and Cb components are downsampled quite a bit. There is no point in having a perfectly reversible transform. (We’ll discuss YCrCb later in this series).

But if, for some reason, you need lossless compression, the color space transform must also be lossless, that is, reversible. That is, RGB to this space and back gives the exact same color.

Kodak developed a quite a while ago (in the 1990s) a series of file formats for digital photography. (They all went the way of the dodo, but that’s another story). These formats were proprietary and very badly documented. They were fortunately reverse-engineered, and you can find implementations in ImageMagick, for example. Some of the supported colorspaces were designed for negative transfer, and other analog photography stuff. One of the color, “Kodak 1” was design for lossless compression (the other interesting color space, “Kodak YCC”, will be the topic of the next post in this series).

We transform RGB into Kodak 1 using the transform

$\begin{bmatrix} 1 & 1 & 1\\ -1 & -1 & 1\\ 1 & -1 & -1\\ \end{bmatrix} \begin{bmatrix} r\\ g\\ b\\ \end{bmatrix} = \begin{bmatrix} k_1\\ k_2\\ k_3\\ \end{bmatrix}$

and back to RGB with

$\begin{bmatrix} \frac{1}{2} & 0 & \frac{1}{2}\\ 0 & -\frac{1}{2} & -\frac{1}{2}\\ \frac{1}{2} & \frac{1}{2} & 0 \\ \end{bmatrix} \begin{bmatrix} k_1\\ k_2\\ k_3\\ \end{bmatrix} = \begin{bmatrix} r\\ g\\ b\\ \end{bmatrix}$

This colorspaces encodes something that looks like the brightness in the first component, and “yellow vs blue” and “red vs cyan” in the two others. The choice seems weird, but is motivated by studies of the human visual system. While we have four types of retina cells (b/w, red, green and blue), the signals that travels to the brain isn’t RGB, but b/w, red vs green and blue vs yellow. Kodak 1 approximates this.

*
* *

We can verify that the Kodak 1 colorspace is indeed reversible. Since

$\begin{bmatrix} 1 & 1 & 1\\ -1 & -1 & 1\\ 1 & -1 & -1\\ \end{bmatrix} \begin{bmatrix} r\\ g\\ b\\ \end{bmatrix} = \begin{bmatrix} r+g+b\\ -r-g+b\\ r-g-b\\ \end{bmatrix}$,

then

$\begin{bmatrix} \frac{1}{2} & 0 & \frac{1}{2}\\ 0 & -\frac{1}{2} & -\frac{1}{2}\\ \frac{1}{2} & \frac{1}{2} & 0 \\ \end{bmatrix} \begin{bmatrix} r+g+b\\ -r-g+b\\ r-g-b\\ \end{bmatrix} = \begin{bmatrix} \frac{1}{2}(r+g+b)+\frac{1}(r-g-b)\\ -\frac{1}{2}(-r-g+b)-\frac{1}{2}(r-g-b)\\ \frac{1}{2}(r+g+b)+\frac{1}{2}(-r-g+b)\\ \end{bmatrix} = \begin{bmatrix} r\\ g\\ b\\ \end{bmatrix}$.

Terms exactly cancel out, and there’s no possible rounding error.

*
* *

Lastly, we may be interested in actually using Kodak 1 coded colors in a compression scheme which means we must actually code them. Usually, the source RGB pixels will be on 24 bits, but we notice that Kodak 1 colors will need more than 8 bits per components. Quite so: the first component is r+g+b and can be between 0 and 765 (3×255); the second and third component vary between -510 and 255. To avoid coding negative numbers, we may add a bias of 510, transforming the components to 0 to 765. If we use an integer number of bits, we need 10 bits to encode the individual components. If we use “arithmetic” coding, we may use approximately 28.73 bits. Let’s see how we do that.

#include <cstdint>

////////////////////////////////////////
struct rgb
{
uint8_t r,g,b;

rgb(uint8_t r_,
uint8_t g_,
uint8_t b_) : r(r_),g(g_),b(b_) {}
};

////////////////////////////////////////
struct kodak1
{
int16_t k1,k2,k3;

kodak1(int16_t k1_,
int16_t k2_,
int16_t k3_) : k1(k1_),k2(k2_),k3(k3_) {};
};

////////////////////////////////////////
kodak1 to_kodak1(const rgb & x)
{
return kodak1( x.r+x.g+x.b,
-x.r-x.g+x.b,
x.r-x.g-x.b);
};

////////////////////////////////////////
rgb to_rgb(const kodak1 & k)
{
return rgb( ( k.k1+k.k3)/2,
(-k.k2-k.k3)/2,
( k.k1+k.k2)/2 );
};

////////////////////////////////////////
uint32_t kodak1_to_code(const kodak1 & k)
{
return
k.k1*766*766
+ (k.k2+510)*766
+ (k.k3+510);
}

////////////////////////////////////////
kodak1 code_to_kodak1(uint32_t x)
{
return kodak1(x/(766*766),
(x/766) % 766 - 510,
x % 766 - 510 );
}


(Read the series on fractional bits to understand how this works: part 1, part 2, and part 3.). The above encoding uses 29 bits, and is perfectly reversible.