## Lossless Coding of CD Audio

Once upon a time, I discussed how to pick bit-rate for MP3, while considering re-ripping all my CDs. But if I’m to re-rip everything, I might as well do it one last time and use lossless compression.

In this post, we’ll discuss the simple script I cooked up to do just that, and a bit on how Flac works, and how it compares to MP3.

First, to run the script, you will need cdparanoia and flac, both packages readily available from Debian/Ubuntu repositories (and, for the MP3 version of the script, found at the end of this post, you will need lame, also found in the default repositories).

The first thing to do is to probe the CD for its Table Of Contents (TOC hereafter). Fortunately, cdparanoia is a fine piece of software and it detects the default/loaded CD/DVD drive. It suffice to invoke it with the probe option like this

echo probing...

if cdparanoia -Q 2> cd-rip-out.tmp~
then
...


To find the drive and fetch the TOC, here written to a temporary file. Redirection from stderr is needed because cdparanoia outputs to stderr (for some reason). This temporary file is parsed to show the tracks and get their numbers:

grep -e '^[[:blank:]]*[0-9]*\.' cd-rip-out.tmp~ > cd-rip-toc.tmp~
tracks=$( cut -d. -f 1 cd-rip-toc.tmp~ )  (The exact poutine is cd-paranoia-specific), but in short, it gets the track number and forms a list such as 1 2 3 4. The user is also, later on, able to either chose the default list or specify his own, including cdparanoia-specific syntax such as 12-15 to grab tracks 12 to 15, inclusive, as a single track. We also add a couple of cosmetic details such as promoting track numbers from 1 to 001 (so that they all line up all pretty), ask for the track title, and invoke flac:  # main grab loop for track in${track_list[@]}
do
((t++))
# sed hack from Vincent "gnuvince" Foley
beautiful_track=$(echo$track \
| sed 's/$$[0-9]*$$/000\1/g;s/[0-9]*$$[0-9]\{3\}$$/\1/g')

read -p "tracks $beautiful_track ($t/$t_l) title: " this_track_title this_track_name="$artist -- $album -$beautiful_track - $this_track_title" if cdparanoia --never-skip=2$track -w "$this_track_name.wav" then # grab successful! flac --best \ "$this_track_name.wav" \
--tag=artist="$artist" \ --tag=album="$album" \
--tag=track="$beautiful_track" \ --tag=title="$beautiful_track $this_track_title" \ --tag=year="$year"
else
echo An error occurred while grabbing \$track
fi
done


(The ugly sed hack to convert tracks number from 1 to 001 is due to gnuvince).

Flac provides many options, but it seems that --best does indeed provide very good parameters, choosing the best compression settings regardless of CPU time (and, let us be serious, it’s rather unimportant that it takes “more” CPU time when it takes about 30s to compress an entire CD on a modern CPU—once the sound is ripped from the CD).

*
* *

Flac encodes sound losslessly, that is, if you decompress a .flac file, you get, bit for bit, the .wav you input. MP3, on the other hand, is a conspicuously lossy encoder, that is, it destroys information in order to achieve much higher compression ratios. One essential piece of a modern MP3 encoder is its psycho-acoustic model, an engine that analyzes sound and determines, with quantization enabled, what will be the perceived loss incurring from destroying this or that feature in the sound. Needless to say, the better the model is, the better sound quality can be because the encoder will be able to take the right decisions about what information to destroy—destroy sound information that you should hear the least.

But at very low bit-rates, it’s the quantization/decimation engine that dominates. To meet the very low bit-rates, the encoder will destroy plenty of information, and even if the psycho-acoustic model helps it take the right decisions while coding, the result will still be a sound with lots of artifacts, and it will not sound that good. Effects will be noticeable or even irritating. At high bit-rates, however, it will be the psycho-acoustic model that will dominate. At high bit-rate, there is no need to fiercely quantize and decimate, but the psycho-acoustic model can still force the encoder to remove features in the sound that are deemed inaudible (according to the model engine) despite having enough bits available to code them. In this regime, cranking the bit-rate will result only in marginal file size increase, because what extra bits you throw at the encoding are lost to the model that removes features from the sound.

We have seen this before, haven’t we?

Relative file sizes for --vbr-new

In the graph, we kind of see that some files do not grow as bit-rate increase, others grow only somewhat; at least, quite sub-linearly.

*
* *

With lossless coding, we keep everything, so it is crucial that we have a good prediction model, that is, an engine that can give a good prediction on the next sample $x_t$ from the previous $w$ samples, $x_{t-1}, x_{t-2}, \ldots, x_{t-w}$. If the error $\hat{x}_t-x_t$ is (expectedly) small (well, to be exact, if it has a small entropy), then we can devise an efficient code for it and get good compression.

Flac does not use an elaborate sound model. It uses linear prediction to predict values. Linear prediction can be written as:

$\hat{x}_t = \sum_{i=1}^{w}a_i x_{t-i}$

(or, in vector notation, $\hat{x}_t=a^Tx_{t-1}^{t-w}$—it is only a dot-product, after all.) At each time $t$, a new set of values $a_i$ are computed in order to minimize the expected error $(\hat{x}_{t+1}-x_{t+1})^2$ (there are possible variants) for the next time step. Turns out it’s not too intensive if the window is small-ish (especially for somewhat recent CPUs), because if you need to invert a $w\times{w}$ matrix, it is of a special kind, it is a Toeplitz Matrix. Because of the structure of the matrix, it as $O(w)$ degrees of freedom rather than $O(w^2)$, and the inversion of a Toeplitz matrix (or solving a Toeplitz system; same difference) can be performed in much better than $O(w^3)$: it can be done in $O(w^2)$!

Linear Prediction Coding isn’t something new, it’s been in use in various speech coding schemes (such as the GSM Speech Codecs). What makes it attractive is that, well, it seems to work fairly well on speech and sound, and it has efficient algorithms to solve for the parameters of the linear prediction.

*
* *

Once the prediction $\hat{x}_t$ is done, the difference $x_t-\hat{x}_t$ is coded. In Flac, it seems that this difference is considered to be geometrically distributed, and accordingly encoded using a Golomb Code. It is the exact encoding the residual $e_t=x_t-\hat{x}_t$ that makes the recovery of the original signal: indeed, $x_t=\hat{x}_t+e_t$.

*
* *

You can download the full scripts here for Flac and here for MP3 high bit-rate VBR. Feel free to tinker with them; that’s what they’re for.

*
* *