New Block Order

March 21, 2017

The idea of reorganizing data before compression isn’t new. Almost twenty five years ago, Burrows and Wheeler proposed a block sorting transform to reorder data to help compression. The idea here is to try to create repetitions that can be exploited by a second compression engine.

But the Burrows-Wheeler transform isn’t the only possible one. There are a few other techniques to generate (reversible) permutations of the input.

Read the rest of this entry »


8-bit Audio Companding (part II)

February 28, 2017

A few weeks back, I presented an heuristic for audio companding, making the vague assumption that the distribution of values—sound samples—is somewhat exponentially-distributed. But is it the case?

ssound-blocks

Let’s then find out the distribution of the samples. As before, I will use the Toréador file and a new one, Jean Michel Jarre’s Electronica 1: Time Machine (the whole CD). The two are very different. One is classical music, the other electronic music. One is adjusted in loudness so that we can here the very quiet notes as well as the very loud one, the other is adjusted for mostly for loudness, to maximum effect.

So I ran both through a sampler. For display as well as histogram smoothing purposes, I down-sampled both channels from 16 to 8 bits (therefore from 0 to 255). In the following plots, green is the left channel and (dashed) red the right. Toréador shows the following distribution:

toreador

or, in log-plot,

toreador-log

Turns out, the samples are Laplace distributed. Indeed, fitting a mean \mu=127 and a parameter \beta\approx{7.4} agrees with the plot (the ideal Laplacian is drawn in solid blue):

toreador-log-laplace

Now, what about the other file? Let’s see the plots:

time-machine

and in log-plot,

time-machine-log

and with the best-fit Laplacian superimposed:

time-machine-log-laplace

Now, to fit a Laplacian, the best parameters seem to be \mu=127 and \beta\approx{27}. While the fit is pretty good on most of the values, it kind of sucks at the edge. That’s the effect of dynamic range compression, a technique used to limit a signal’s dynamic range, often in a non-uniform way (the signal values near or beyond the maximum value target get more squished). This explains the “ears” seen in the log-plot, also seen in the (not log-)plot.

*
* *

Making the hypothesis that the samples are Laplace-distributed will allow us to devise an efficient quantization scheme for both the limits of the bins and the reconstruction value. In S-law, if we remember, the reconstructed value used is the value in the center of the interval. But, if the distribution is not uniform in this interval, the most representative value isn’t in its center. It’s the value that minimizes the squared expected error. Even if the expression for the moments of a Laplace-distributed random variable isn’t unwieldy, we should arrive at a very good, and parametric, quantization scheme for the signal.


8-bit Audio Companding

February 7, 2017

Computationally inexpensive sound compression is always difficult, at least if you want some quality. One could think, for example, that taking the 8 most significant bits of 16 bits will give us 2:1 (lossy) compression but without too much loss. However, cutting the 8 least significant bits leads to noticeable hissing. However, we do not have to compress linearly, we can apply some transformation, say, vaguely exponential to reconstruct the sound.

ssound-blocks

That’s the idea behind μ-law encoding, or “logarithmic companding”. Instead of quantizing uniformly, we have large (original) values widely spaced but small (original) value, the assumption being that the signal variation is small when the amplitude is small and large when the amplitude is great. ITU standard G.711 proposes the following table:

Read the rest of this entry »


Fractional Bits (Part III)

October 18, 2016

A long time ago, we discussed the problem of using fractions of bits to encode, jointly, values without wasting much bits. But we considered then only special cases, but what if we need to shift precisely by, say, half a bit?

But what does shifting by half a bit even means?

Read the rest of this entry »


Of Colorspaces and Image Compression (Part III)

November 11, 2014

Last week, we continued on color spaces, and we showed—although handwavingly—that we’re not very good at seeing color differences in hue and saturation, and that we’re only good at seeing difference in brightness. In this last part, we’ll have a look at how JPEG exploits that for image compression.

P1131283

Read the rest of this entry »


Of Colorspaces and Image Compression (Part II)

November 4, 2014

Last week we discussed briefly how we can represent color using either RGB or a colorspace that’s more easily amenable to decimation by a psychovisual model.

P1131283

This week, we’ll have a look at just how tolerant the human visual system is to color (hue/saturation) variations.

Read the rest of this entry »


Of Colorspaces and Image Compression (Part I)

October 28, 2014

The art of lossy compression consists almost entirely in choosing a representation for your data in which you can easily figure out what data to destroy. That is, if you’re compressing sound, you must transform the waveform into a domain where you can apply a psychoacoustic model that will guide decimation, that is, the deletion of part of the information, part of the information that won’t be heard missing. If you’re rather compressing images—that will be our topic today—you must have a good psychovisual model that will help you choose what information to destroy in your image

P1131283

The model can be very sophisticated—and therefore computationally expensive—or merely a good approximation of what’s going on in the observers’ eyes (and mind).

But let’s start with the beginning: representing color

Read the rest of this entry »