8-bit Audio Companding

Computationally inexpensive sound compression is always difficult, at least if you want some quality. One could think, for example, that taking the 8 most significant bits of 16 bits will give us 2:1 (lossy) compression but without too much loss. However, cutting the 8 least significant bits leads to noticeable hissing. However, we do not have to compress linearly, we can apply some transformation, say, vaguely exponential to reconstruct the sound.

ssound-blocks

That’s the idea behind μ-law encoding, or “logarithmic companding”. Instead of quantizing uniformly, we have large (original) values widely spaced but small (original) value, the assumption being that the signal variation is small when the amplitude is small and large when the amplitude is great. ITU standard G.711 proposes the following table:

Range value spacing Code
4063 — 8158 256 1000xxxx
2015 — 4062 128 1001xxxx
991 — 2014 64 1010xxxx
479 — 990 32 1011xxxx
223 — 478 16 1100xxxx
95 — 222 8 1101xxxx
31 — 94 4 1110xxxx
1 — 30 2 1111xxxx
0 11111111
-1 01111111
-31 — -2 2 0111xxxx
-95 — -32 4 0110xxxx
-223 — -96 8 0101xxxx
-479 — -224 16 0100xxxx
-991 — -480 32 0011xxxx
-2015 — -992 64 0010xxxx
-4063 — -2016 128 0001xxxx
-8159 — -4064 256 0000xxxx

This table gives a code for about 14 bits per sample, which was deemed enough for toll-quality sound. But what if we want a code for 16 bits? The above table doesn’t quite cover the range. We could reduce the precision of the sample from 16 to 14 (with techniques like those we saw last week) then use G.711 μ-law companding. Or we could redesign the code.

Let’s say we still want a similar code structure with sign, range and value within range, that is:

s rrr xxxx

Let’s also remove the special cases for 0 and -1. Let’s also call it the S-law, because of reasons. We must now adjust the “stretching”. Knowing that the lowest number in a range is 1 more than the largest number in the previous range, we find that we must solve

\displaystyle \sum_{r=0}^7 16a^r=16\sum_{r=0}^7a^r=32767

for a. The exact value of a, 2.789469…, is rather cumbersome for our needs. Such a value spacing would lead to non-integer values and we kind of want to avoid that. If we round down to two, we get much less than the desired range, and 3 gives us more. Using 3 we the following code:

Range value spacing Offset Code
srrrxxxx
17488 — 52479 2187 1093 0111xxxx
5824 — 17487 729 364 0110xxxx
1936 — 5823 243 121 0101xxxx
640 — 1935 81 40 0100xxxx
208 — 639 27 13 0011xxxx
64 — 207 9 4 0010xxxx
16 — 63 3 1 0001xxxx
0 — 15 1 0 0000xxxx
-16 — -1 1 0 1000xxxx
-64 — -17 3 -1 1001xxxx
-208 — -65 9 -4 1010xxxx
-640 — -209 27 -13 1011xxxx
-1936 — -641 81 -40 1100xxxx
-5824 — -1937 243 -121 1101xxxx
-17488 — -5825 729 -364 1110xxxx
-52480 — -17489 2187 -1093 1111xxxx

This leaves us codes 0x78 to 0x7f and 0xf8 to 0xff for other uses, as they represent values above 32767 and below 32768. The offset is needed for reconstruction: it gives the center value for that cell. A cell that is 3 units wide has its center at 1 (starting at zero, of course), a cell that is 9 units wide has it center at 4, etc.

Let’s see what the code should look like:

////////////////////////////////////////
uint8_t s_law_compress(int16_t s)
 {
  if (s<=-17489) return 0x80 | 0x70 | -(s+17489)/2187;
  if (s<=-5825)  return 0x80 | 0x60 | -(s+5826)/729;
  if (s<=-1937)  return 0x80 | 0x50 | -(s+1937)/243;
  if (s<=-641)   return 0x80 | 0x40 | -(s+641)/81;
  if (s<=-209)   return 0x80 | 0x30 | -(s+209)/27;
  if (s<=-65)    return 0x80 | 0x20 | -(s+65)/9;
  if (s<=-17)    return 0x80 | 0x10 | -(s+17)/3;
  if (s<=-1)     return 0x80 | 0x00 | -(s+1);

  if (s<16)      return 0x00 | 0x00 | s;
  if (s<64)      return 0x00 | 0x10 | (s-16)/3;
  if (s<208)     return 0x00 | 0x20 | (s-64)/9;
  if (s<640)     return 0x00 | 0x30 | (s-208)/27;
  if (s<1936)    return 0x00 | 0x40 | (s-640)/81;
  if (s<5824)    return 0x00 | 0x50 | (s-1936)/243;
  if (s<17488)   return 0x00 | 0x60 | (s-5824)/729;
  /*else*/       return 0x00 | 0x70 | (s-17488)/2187;  
 }

////////////////////////////////////////
uint16_t s_law_decompress(int8_t c)
 {
  uint8_t range=c & 0xf0; // srrr
  uint8_t value=c & 0x0f; // vvvv

  switch (range)
   {
   case 0xf0: return -17489-value*2187-1093; // 0xf8--0xff undefined!
   case 0xe0: return -5825 -value*729 -364;
   case 0xd0: return -1937 -value*243 -121;
   case 0xc0: return -641  -value*81  -40;
   case 0xb0: return -209  -value*27  -13;
   case 0xa0: return -65   -value*9   -4;
   case 0x90: return -17   -value*3   -1;
   case 0x80: return -1    -value     -0;

   case 0x00: return 0    +value      +0;
   case 0x10: return 16   +value*3    +1;
   case 0x20: return 64   +value*9    +4;
   case 0x30: return 208  +value*27   +13;
   case 0x40: return 640  +value*81   +40;
   case 0x50: return 1936 +value*243  +121;
   case 0x60: return 5824 +value*729  +364;
   case 0x70: return 17488+value*2187 +1093; // 0x78--0x7f undefined!
   }

  return 0; // unreachable, but needed to suppress warning
 }

*
* *

I cannot really measure how good the code is subjectively. The only thing I can say about it is that hiss is hardly noticeable, much less than a simple truncation to 8 bits. Well, that’s not that surprising: μ-law and α-law (essentially the same idea but using different constants) have been standardized and used for protocols where the target bit rate was 64 kbits/s (at 8000 samples/s, and 8 bit per sample).

Still, let’s have a look at the effect this companding has on sound. I took Bizet’s “Les toréadors” (most because I needed something I could stand listening to 20 times in a row). The original piece, fresh of the CD looks like:

companding-ss-cd-original

As one can see, the original has been very carefully mastered to remove any low amplitude high frequencies, that is, noise. Random low amplitude high frequencies component would manifest itself as high pitch hiss, and can be noticeable if the music doesn’t mask it.

With a reduction from 16 to 8 bits then back again, the hiss is conspicuous:

companding-ss-8bits

The blue/purple/gray texture shown in the spectrogram is akin to white noise, low amplitude in all frequencies. You can hear a distinct hiss. With S-Law, we see that there is a lot of noise:

companging-ss-s-law

But: 1) it’s much better at reducing noise with very low amplitude sound, as is shown at the extreme left: the leading “silence” remains (mostly) silent, while with 8 bits downgrading, there’s a fair amount of noise in that area. 2) The total quantity of noise is also much reduced, as is shown by the prevalence of pale blue areas.

*
* *

Of course, 8 bit single-sample companding can’t do miracles. That’s why this type of approach is more or less forgotten and more complex coding techniques are favored. One big difference is that modern waveform-coding techniques will not code individual samples but a bunch of samples together, exploiting all kind of dependencies.

One Response to 8-bit Audio Companding

  1. […] few weeks back, I presented an heuristic for audio companding, making the vague assumption that the distribution of values—sound samples—is somewhat […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: