But that’s not a surprise. All the colorspaces we’ve seen so far do this, and (are believed to) have the same properties, grosso modo.

JPEG uses the YCbCr colorspace:

with inverse

*

* *

I proposed a variant of this colorspace for DjVu for fast decoding. Instead of starting from the “forward” transform, I started from the inverse:

The rationale is that multiplications using numbers such as are easily re-rewritten as additions and shifts. can be rewritten as as well as , and so can be computed as `x-(x>>2)`. The inverse can be computed all in integers. The forward transform is given by the inverse’s inverse:

*

* *

The values in the brightness line, 0.299, 0.587, and 0.114, were chosen a longtime ago for various reasons: the standard response of displays—then phosphorus-based—and industry-imposed corrections at capture-time—the standard gamma corrections. Overtime TV went from analog to digital and so the whole process changed. CRT were replaced by flat screens that used completely different technologies with different colors. To address these changes, new colorspaces where introduced:

YPbPr:

with inverse

There are other versions of this colorspace. Defined in BT.709, YPbPr is eventually modified by BT.2020 for Ultra HDTV, and BT.2100 for HDR Video.

]]>

with inverse

What’s unusual with this one is how the chrominance components are scaled. Unlike the other colorspaces we’ve seen so far, this one stretches the two chrominance components quite a bit, giving it a squished appearance. Why? It may be that it buys the scheme some tolerance to the frequency modulation used to encode the chrominance components—but I haven’t found anything on this. In fact, just finding documentation on SÉCAM is hard.

]]>

The YUV colorspace, like all the others we’ve seen so far, encodes the brightness in the first components and uses two difference components, U and V, that somewhat correspond to the yellow-blue and the red-cyan differences:

with inverse

In PAL, YUV is sent over the air using a special frequency modulation. The Y component eats the whole band, and carries the black and white image. The U and V components are quadrature amplitude modulated over the Y signal. If decoded for Y only, the signal would only exhibit high-frequency, low amplitude noise, probably unnoticeable in normal viewing conditions.

*

* *

YIQ is the color space used in NTSC for color encoding. The transform is given by:

with inverse

The I and Q components are the red-cyan and yellow-blue differences but rotated -33° around the Y axis, and scaled by what essentially look like magic constants. Why? I am not sure, because I and Q proceed to another encoding stage, the quadrature amplitude modulation (QAM), just as in YUV/PAL. Maybe it was meant to be especially simple in (analog) hardware decoding? That, I’m not very convinced, but I never build a decoder out of analog parts.

*

* *

The coefficients used in the various colorspace bases (or transforms) seem random, except for some regularities such as the first row that encodes brightness, sometimes with the same coefficients (0.299, 0.587, 0.114), but in fact, there’s a lot more order behind all this. I will close this series of posts with a bird’s eye view of (linear) colorspaces that will explain a lot. Yeah, I’m a tease.

]]>The first line, yielding Y, gives something close to the brightness, but it differs from most other colorspaces that will use 0.299, 0.587, and 0.114 to compute brightness. Are Xerox YES’ values gamma corrected? Or relative to some other space than RGB?

Can’t say. There are very few references to this colorspace, and all points to a 1989 memo I couldn’t find [1]. The inverse

isn’t especially friendly to integer computation.

*

* *

I think a colorspace must have at least one of these characteristics to be useful:

- match the human visual system’s sensibility to brightness, color, and saturation,
- be perfectly reversible for lossless compression,
- be smooth in its components.

Matching the human visual system’s sensibility is useful for quantization because you know we’re very sensitive to brightness, but much less to color (hue, tint) and saturation. JPEG and TV make very much use of that—more on this later.

Reversibility is of course useful for lossless compression because you want to retrieve the original image bit by bit. While lossless compression isn’t always useful, you may want it in special applications (medical imaging, for example) or for special types of images.

The last criterion is also important for compression. For compression, you want the components to be as correlated as possible because, pretty much for every encoding scheme, it also means fewer bits. If you use simple differential encoding, the differences are smaller and distributed around zero… and that’s good for compression. If you use something like the DCT or wavelets, what also means most coefficients are small, and that’s also good for compression.

If your colorspace doesn’t have any of these properties, then it may not be that good.

[1] Xerox Color Encoding Standard, (tech rep?) XNSS 289005 (1989?)

]]>Ohta’s concern wasn’t image coding but region separation. He supposed (without much evidence) that a color space with a basis close to the principal components of the colors in the image should be maximally discriminant. He then proposed that the colorspace

He didn’t propose an inverse, and wasn’t concerned about reversibility. Of course it has an inverse, but not one that is computable using only integer arithmetic. The inverse is:

Somewhere else (still in his Ph.D. thesis) he proposes a different normalization, but the idea is the same:

With inverse

*

* *

To make the Ohta colorspace(s) reversible, we must use coefficients that will not involve truncation—unless, of course, you’re planning to use floating point numbers. With

and

…we get reversibility, despite the appearances.

[1] Yu-Ichi Ohta — *A Region-Oriented Image-Analysis System by Computer* — Ph.D. thesis, Dept. of Computer Science, Kyoto University (1980).

Kodak YCC tries to do just that. The first component is the perceived brightness, as a combination or the relative brightness of each primary color. The two other components, as in Kodak 1, are the “yellow vs blue” and “red vs cyan” differences.

The only odd thing is the two coefficients 0.886 and 0.701… why not 0.114 and 0.299? They are attempt to keep the and coefficients close to zero, and balanced (remember, Kodak 1 had these components vary between -510 and 255). The inverse is

*

* *

Reversibility is more complicated now. First, we have matrices that aren’t precisely each other’s inverses, and we’re using floats. The first problem may be solved using POCS, but now neither matrices are quite the same. For the second problem, floats, we do not have that much control over it: their behavior is compiler-dependent and CPU dependent. X86/AMD64 CPUs have 80-bits internal float registers, and the compiler may (or may not) opt to used them as long as possible before casting the results to float or int: results may vary.

Worse: Kodak YCC was supposed to be used gamma-compensated:

If r,g,b are greater than 0.018:

If not,

I would think that these adjustments make sense with high-dynamic range and non-linear responses in images. It makes, however, losslessness very hard.

]]>However, if RGB is intuitive, it is not a very good color space for compression because it doesn’t exploit any of our perceptual quirks. We’re very good at distinguishing small variation in brightness, but not so much in tint or saturation. Compression schemes need to exploit this in order to destroy information (and obtaining better compression). This is why almost all image and video compression algorithm out there use a different color space, one that represents color as brightness, and two (or more!) components that are more or less related to tint and saturation—or some other measure of difference of color.

The colors spaces come in two families:

**Linear color spaces**. As their name implies, they are linear spaces, or more precisely vector spaces. To convert from, and to, RGB, all we need is a linear transformation which usually materializes as a matrix-vector product.**Non linear color spaces**. These color spaces uses arbitrary transforms such as mapping onto a circle—we’ll see some of those later. They are usually more complicated and costly to compute, but they may be closer to our own perception of color, or, more accurately, our perception of the differences in color.

For performance purposes—real or perceived—compression schemes prefer linear transforms.

*

* *

Usually, for image and video compression, we’re interested in lossy compression and we do not really care if some bits are lost in the color space transformation. JPEG, for example, transforms the image from RGB to another color space, YCrCb, where the Cr and Cb components are downsampled quite a bit. There is no point in having a perfectly reversible transform. (We’ll discuss YCrCb later in this series).

But if, for some reason, you need lossless compression, the color space transform must also be lossless, that is, reversible. That is, RGB to this space and back gives the exact same color.

Kodak developed a quite a while ago (in the 1990s) a series of file formats for digital photography. (They all went the way of the dodo, but that’s another story). These formats were proprietary and very badly documented. They were fortunately reverse-engineered, and you can find implementations in ImageMagick, for example. Some of the supported colorspaces were designed for negative transfer, and other analog photography stuff. One of the color, “Kodak 1” was design for lossless compression (the other interesting color space, “Kodak YCC”, will be the topic of the next post in this series).

We transform RGB into Kodak 1 using the transform

and back to RGB with

This colorspaces encodes something that looks like the brightness in the first component, and “yellow vs blue” and “red vs cyan” in the two others. The choice seems weird, but is motivated by studies of the human visual system. While we have four types of retina cells (b/w, red, green and blue), the signals that travels to the brain isn’t RGB, but b/w, red vs green and blue vs yellow. Kodak 1 approximates this.

*

* *

We can verify that the Kodak 1 colorspace is indeed reversible. Since

,

then

.

Terms exactly cancel out, and there’s no possible rounding error.

*

* *

Lastly, we may be interested in actually using Kodak 1 coded colors in a compression scheme which means we must actually code them. Usually, the source RGB pixels will be on 24 bits, but we notice that Kodak 1 colors will need more than 8 bits per components. Quite so: the first component is r+g+b and can be between 0 and 765 (3×255); the second and third component vary between -510 and 255. To avoid coding negative numbers, we may add a bias of 510, transforming the components to 0 to 765. If we use an integer number of bits, we need 10 bits to encode the individual components. If we use “arithmetic” coding, we may use approximately 28.73 bits. Let’s see how we do that.

#include <cstdint> //////////////////////////////////////// struct rgb { uint8_t r,g,b; rgb(uint8_t r_, uint8_t g_, uint8_t b_) : r(r_),g(g_),b(b_) {} }; //////////////////////////////////////// struct kodak1 { int16_t k1,k2,k3; kodak1(int16_t k1_, int16_t k2_, int16_t k3_) : k1(k1_),k2(k2_),k3(k3_) {}; }; //////////////////////////////////////// kodak1 to_kodak1(const rgb & x) { return kodak1( x.r+x.g+x.b, -x.r-x.g+x.b, x.r-x.g-x.b); }; //////////////////////////////////////// rgb to_rgb(const kodak1 & k) { return rgb( ( k.k1+k.k3)/2, (-k.k2-k.k3)/2, ( k.k1+k.k2)/2 ); }; //////////////////////////////////////// uint32_t kodak1_to_code(const kodak1 & k) { return k.k1*766*766 + (k.k2+510)*766 + (k.k3+510); } //////////////////////////////////////// kodak1 code_to_kodak1(uint32_t x) { return kodak1(x/(766*766), (x/766) % 766 - 510, x % 766 - 510 ); }

(Read the series on fractional bits to understand how this works: part 1, part 2, and part 3.). The above encoding uses 29 bits, and is perfectly reversible.

]]>So in the best of cases, you could end-up with less code, or better yet, *no* code at all!

So let’s consider a toy problem: managing endian conversions. For a lot of things, integers are represented in “network order” (big endian) which may, or may not, be the internal representation of the machine manipulating the data. Intel CPUs are small endian machines.

Of course, there are already standard C implementations (in `<netinet/in.h>`, or maybe in some other header depending on your compiler/OS), for those: `hton*` (host-to-network) and `ntoh*` (network-to-host). However, these are C-style functions that may be immune to deep compiler optimizations. Granted, they’re not very compute-intensive, but we may be able to do better for constant arguments and allow the compiler to optimize in depth, maybe to the point of not generating code at all.

For this, `constexpr` comes to the rescue! As I said earlier, it will evaluate as many things as it can at compile time. But, to the difference of the preprocessor (these evil `#define` statements), it will honor *all* of C++’s semantics. `#define`s are hardly anything more than find-and-replace macros, which may lead to a number of surprises.

*

* *

How do we detect endianness at compile-time? We can create a `constexpr` integer constant initialized to 1 and test if its first byte is 0 (which would mean the CPU is big endian) or 1. The rest is just swapping bytes around, if we need to.

#ifndef __MODULE_SWAPPITY__ #define __MODULE_SWAPPITY__ #include <cstdint> //#undef __GNUG__ // see what happens! class swappity { private: constexpr static int x=1; public: constexpr static bool will_swap() { return (*(char*)&x); } constexpr static uint8_t swap(uint8_t x) { return x; } constexpr static uint16_t swap(uint16_t x) { return will_swap() ? #ifdef __GNUG__ __builtin_bswap16(x) #else ((x>>8)|(x<<8)) #endif : x; } constexpr static uint32_t swap(uint32_t x) { return will_swap() ? #ifdef __GNUG__ __builtin_bswap32(x) #else (swap((uint16_t)(x>>16)) | (swap((uint16_t)x)<<16)) #endif : x; } static uint64_t swap(uint64_t x) { return will_swap() ? #ifdef __GNUG__ __builtin_bswap64(x) #else (swap((uint32_t)(x>>32)) | ((uint64_t)swap((uint32_t)x)<<32)) #endif : x; } }; #endif // __MODULE_SWAPPITY__

The only really fancy thing about this implementation is the use of GCC-specific intrinsics. If the code is compiled by G++, then `__GNUG__` is defined, and the intrinsics are used. Otherwise, “ordinary” bit-twiddling is used. The funny thing is, there doesn’t seem to be a significant difference in speed either way:

#include <sys/time.h> #include <iostream> #include <iomanip> #include <swappity.hpp> uint64_t now() { struct timeval t; gettimeofday(&t,0); return t.tv_sec*1000000+t.tv_usec; } int main() { int s=0; // to fool optimizations uint64_t start=now(); for (uint32_t l=0,z=0;l<=z;l=z,z++) s+=swappity::swap(z); uint64_t stop=now(); std::cout << (stop-start)/1000000.0 << 's' << std::endl << s << std::endl; return 0; }

This program runs in 3.55±0.02s with `__GNUG__` defined, and in …3.57±0.01s without. It basically makes no difference—except that intrinsics are cool.

A specialization would be interesting because we can do much better than using an integer to store three different values. We can do much, much better.

In fact, to represent three choices, we need bits. If we were to do this using shifts, just as we would use bit shifts, we would need to shift by bits. But shifting by bits is the same as multiplying by … and shifting by bits is the same as multiplying by ! Masking becomes the modulo operator. That shouldn’t be too hard!

Since we can pack any number of `trool`s using

with being the ith trool, we need to choose an integer size to hold `trool`s. We should pick some size that’s a good trade-off between the complexity of arithmetic and the number of bits wasted. Ideally, we should pick such that

,

but that equation has no solutions other than , which isn’t very interesting. So we should solve it so that is as close to without going over it. Also, could take any values, but 8, 16, 32 and 64 are the most convenient for a lot of reasons.

The largest is given by . The efficiency of choosing for is

,

and the closer to one, the better. For , we find:

8 | 5 | 0.94 |

16 | 10 | 0.90 |

32 | 20 | 0.81 |

64 | 40 | 0.65 |

So the efficiency is better using byte-size integers. Also, at worst, we will do 5 integer divisions/mod, which isn’t too bad. Let’s see:

// in class __trool: explicit operator int() const { return state; } //... }; //////////////////////////////////////// int pow3(int x) { int p=1; for (int i=0;i<x;i++,p*=3); return p; } //////////////////////////////////////// trool trits_get(uint8_t trits, int n) { return (trits/pow3(n))%3; } //////////////////////////////////////// uint8_t trits_set(uint8_t trits, int n, trool val) { int p=pow3(n); return ((trits/(p*3))*3+(int)val)*p+(trits % p); }

So basically, we replaced shifts by divisions or multiplications by (powers of) 3, and masks by modulos. The extraction, `trits_get`, is simple to understand: we shift by positions (and conveniently enough, ), and we take modulo 3 to extract the value. The insertion/rewrite, is a bit more hirsute… Again, we rely on the properties of integer division and on the fact that , and that mod 1 is 0.

*

* *

What if we just gave up an used two bits per `trool`? At two bits per `trool`, the efficiency is

,

while with the byte-aligned encoding, we have

,

a % difference.

]]>Bool is trickier. In C++, bool is either false or true, and weak typing makes everything not zero true. However, if you assign 3 to a bool, it will be “normalized” to true, that is, exactly 1. Therefore, and not that surprisingly, you can’t have true, false, and *maybe*. Well, let’s fix that.

The obvious choice is to define an `enum class` that (redefines) `false`, `true`, and `undefined` (the not-a-bool). But we can’t refine `true` and `false` directly because they are keywords (at least in C++… not so sure in C). So we’ll need to have a (normal) class around that enum class to deal with states and operators.

template <typename storage=int> class _trool { public: // anonymous enum class with storage? using __trool=enum : storage { undefined=0, _false=1, _true=2 }; __trool state; _trool operator=(bool x) { state=x?_true:_false; return *this; } operator char() const { return "uft"[state]; } // because I can. operator bool() const { if (state==undefined) throw std::bad_cast(); else return (state==_true); } _trool(): state(undefined) {} _trool(int s): state(s==0?undefined:(s==1?_false:_true)) {} _trool(bool s): state(s?_true:_false) {} ~_trool()=default; // nothin' much to do }; using trool=_trool<>; // defaults

Since C++11, we can not only declare strongly-typed `enum class`es, but also specify their storage—what kind of integers to use to store the enumerated values. What surprised me is that you can specify storage on an anonymous `enum`. For all kind of performance-related reasons, `enum` storage seems to default to `int`, which is what I’ve implemented here. Well, to be more precise, I implemented a class that accepts a storage type, but also an alias to its default, `int`.

*

* *

The problem with this implementation is that it won’t play very well with `std::vector`. As you know, there’s an explicit specialization, `std::vector<bool>` that uses one bit per bool rather than whatever is the default storage. But we still can use very efficient storage for this tri-state bool thingie.

*To be continued…*