Computationally inexpensive sound compression is always difficult, at least if you want some quality. One could think, for example, that taking the 8 most significant bits of 16 bits will give us 2:1 (lossy) compression but without too much loss. However, cutting the 8 least significant bits leads to noticeable hissing. However, we do not have to compress linearly, we can apply some transformation, say, vaguely exponential to reconstruct the sound.

That’s the idea behind μ-law encoding, or “logarithmic companding”. Instead of quantizing uniformly, we have large (original) values widely spaced but small (original) value, the assumption being that the signal variation is small when the amplitude is small and large when the amplitude is great. ITU standard G.711 proposes the following table:

Range | value spacing | Code |

4063 — 8158 | 256 | 1000xxxx |

2015 — 4062 | 128 | 1001xxxx |

991 — 2014 | 64 | 1010xxxx |

479 — 990 | 32 | 1011xxxx |

223 — 478 | 16 | 1100xxxx |

95 — 222 | 8 | 1101xxxx |

31 — 94 | 4 | 1110xxxx |

1 — 30 | 2 | 1111xxxx |

0 | — | 11111111 |

-1 | — | 01111111 |

-31 — -2 | 2 | 0111xxxx |

-95 — -32 | 4 | 0110xxxx |

-223 — -96 | 8 | 0101xxxx |

-479 — -224 | 16 | 0100xxxx |

-991 — -480 | 32 | 0011xxxx |

-2015 — -992 | 64 | 0010xxxx |

-4063 — -2016 | 128 | 0001xxxx |

-8159 — -4064 | 256 | 0000xxxx |

This table gives a code for about 14 bits per sample, which was deemed enough for toll-quality sound. But what if we want a code for 16 bits? The above table doesn’t quite cover the range. We could reduce the precision of the sample from 16 to 14 (with techniques like those we saw last week) then use G.711 μ-law companding. Or we could redesign the code.

Let’s say we still want a similar code structure with sign, range and value within range, that is:

s | rrr | xxxx |

Let’s also remove the special cases for 0 and -1. Let’s also call it the S-law, because of reasons. We must now adjust the “stretching”. Knowing that the lowest number in a range is 1 more than the largest number in the previous range, we find that we must solve

for . The exact value of , 2.789469…, is rather cumbersome for our needs. Such a value spacing would lead to non-integer values and we kind of want to avoid that. If we round down to two, we get much less than the desired range, and 3 gives us more. Using 3 we the following code:

Range | value spacing | Offset | Code srrrxxxx |

17488 — 52479 | 2187 | 1093 | 0111xxxx |

5824 — 17487 | 729 | 364 | 0110xxxx |

1936 — 5823 | 243 | 121 | 0101xxxx |

640 — 1935 | 81 | 40 | 0100xxxx |

208 — 639 | 27 | 13 | 0011xxxx |

64 — 207 | 9 | 4 | 0010xxxx |

16 — 63 | 3 | 1 | 0001xxxx |

0 — 15 | 1 | 0 | 0000xxxx |

-16 — -1 | 1 | 0 | 1000xxxx |

-64 — -17 | 3 | -1 | 1001xxxx |

-208 — -65 | 9 | -4 | 1010xxxx |

-640 — -209 | 27 | -13 | 1011xxxx |

-1936 — -641 | 81 | -40 | 1100xxxx |

-5824 — -1937 | 243 | -121 | 1101xxxx |

-17488 — -5825 | 729 | -364 | 1110xxxx |

-52480 — -17489 | 2187 | -1093 | 1111xxxx |

This leaves us codes 0x78 to 0x7f and 0xf8 to 0xff for other uses, as they represent values above 32767 and below 32768. The offset is needed for reconstruction: it gives the center value for that cell. A cell that is 3 units wide has its center at 1 (starting at zero, of course), a cell that is 9 units wide has it center at 4, etc.

Let’s see what the code should look like:

//////////////////////////////////////// uint8_t s_law_compress(int16_t s) { if (s<=-17489) return 0x80 | 0x70 | -(s+17489)/2187; if (s<=-5825) return 0x80 | 0x60 | -(s+5826)/729; if (s<=-1937) return 0x80 | 0x50 | -(s+1937)/243; if (s<=-641) return 0x80 | 0x40 | -(s+641)/81; if (s<=-209) return 0x80 | 0x30 | -(s+209)/27; if (s<=-65) return 0x80 | 0x20 | -(s+65)/9; if (s<=-17) return 0x80 | 0x10 | -(s+17)/3; if (s<=-1) return 0x80 | 0x00 | -(s+1); if (s<16) return 0x00 | 0x00 | s; if (s<64) return 0x00 | 0x10 | (s-16)/3; if (s<208) return 0x00 | 0x20 | (s-64)/9; if (s<640) return 0x00 | 0x30 | (s-208)/27; if (s<1936) return 0x00 | 0x40 | (s-640)/81; if (s<5824) return 0x00 | 0x50 | (s-1936)/243; if (s<17488) return 0x00 | 0x60 | (s-5824)/729; /*else*/ return 0x00 | 0x70 | (s-17488)/2187; } //////////////////////////////////////// uint16_t s_law_decompress(int8_t c) { uint8_t range=c & 0xf0; // srrr uint8_t value=c & 0x0f; // vvvv switch (range) { case 0xf0: return -17489-value*2187-1093; // 0xf8--0xff undefined! case 0xe0: return -5825 -value*729 -364; case 0xd0: return -1937 -value*243 -121; case 0xc0: return -641 -value*81 -40; case 0xb0: return -209 -value*27 -13; case 0xa0: return -65 -value*9 -4; case 0x90: return -17 -value*3 -1; case 0x80: return -1 -value -0; case 0x00: return 0 +value +0; case 0x10: return 16 +value*3 +1; case 0x20: return 64 +value*9 +4; case 0x30: return 208 +value*27 +13; case 0x40: return 640 +value*81 +40; case 0x50: return 1936 +value*243 +121; case 0x60: return 5824 +value*729 +364; case 0x70: return 17488+value*2187 +1093; // 0x78--0x7f undefined! } return 0; // unreachable, but needed to suppress warning }

*

* *

I cannot really measure how good the code is subjectively. The only thing I can say about it is that hiss is hardly noticeable, much less than a simple truncation to 8 bits. Well, that’s not that surprising: μ-law and α-law (essentially the same idea but using different constants) have been standardized and used for protocols where the target bit rate was 64 kbits/s (at 8000 samples/s, and 8 bit per sample).

Still, let’s have a look at the effect this companding has on sound. I took Bizet’s “Les toréadors” (most because I needed something I could stand listening to 20 times in a row). The original piece, fresh of the CD looks like:

As one can see, the original has been very carefully mastered to remove any low amplitude high frequencies, that is, *noise*. Random low amplitude high frequencies component would manifest itself as high pitch hiss, and can be noticeable if the music doesn’t mask it.

With a reduction from 16 to 8 bits then back again, the hiss is conspicuous:

The blue/purple/gray texture shown in the spectrogram is akin to white noise, low amplitude in all frequencies. You can hear a distinct hiss. With S-Law, we see that there is a lot of noise:

But: 1) it’s much better at reducing noise with very low amplitude sound, as is shown at the extreme left: the leading “silence” remains (mostly) silent, while with 8 bits downgrading, there’s a fair amount of noise in that area. 2) The total quantity of noise is also much reduced, as is shown by the prevalence of pale blue areas.

*

* *

Of course, 8 bit single-sample companding can’t do miracles. That’s why this type of approach is more or less forgotten and more complex coding techniques are favored. One big difference is that modern waveform-coding techniques will not code individual samples but a bunch of samples together, exploiting all kind of dependencies.

[…] few weeks back, I presented an heuristic for audio companding, making the vague assumption that the distribution of values—sound samples—is somewhat […]