I am currently contemplating the possibility of recoding all my CDs into new compressed formats. I currently use MP3 but since I started my collection in a time when a huge hard drive was 4GB, a lot of those are coded at 96 and 112 kilobits/s… which doesn’t sound all that great. Although MP3 isn’t the greatest format around, it does offer the advantage of being compatible with nearly all players around, something neither FLAC nor Ogg Vorbis can claim.
Some of the most recent devices support the addition of codecs, but not all. My Sansa player doesn’t, although I could probably upgrade the firmware or something but my car built-in player is another story. So for the time being, I considered recoding everything in MP3 with a higher bit-rate, or maybe use FLAC and write some kind of script that transcode the files to MP3 for exporting to a player. I’m not decided yet. But let say I decide to stick with MP3 all the way. What bit-rate should I use?
So I know that the 96 and 112 Kb/s files produced by the older MP3 codecs do not sound all that great, so it is excluded to recode them, even with a modern codec, with the same bit-rates. How high a bit rate do I need to be pleased with the sound? How large would the files get as I increase the bit-rate? Does it really grow linearly with the bit-rate or does it, due to the filtering, psycho-acoustics, and quantization, eventually converge to a file-specific maximum file size?
Let us find out. First, I ripped a few pieces from the original CD, saved in WAV format in the original CD quality of 44010 2×16 bits samples per seconds. I picked 27 (why not 30? not sure!) songs to cover as broad a range as possible in musical styles so to minimize possible genre-specific effects. Amongst all the possible encoders, I picked LAME, which on my system is the 64 bits 3.98.2 version. I chose LAME because apparently gpsycho is doing a great job at psycho-acoustics optimization. It is also libre and conveniently part of my current Ubuntu distribution’s repositories.
Using constant bit-rate (CBR) doesn’t seems like such a good idea. First, songs with little information will still use as much bits as songs with lots of information; the former wasting bits, the latter possibly being short on quality. So variable bit-rate (VBR) is the mode I’m interested in. First, we should figure out what exactly variable bit rate does in the specific codec we will be using—LAME.
LAME uses two variable bit-rate modes, --abr and --vbr. From the documentation, we learn that the ABR mode estimates the needed bit-rate from a perceptual measure and/or from quantization tables, and does not use the actual quantization error and coding efficiency to adjust the bit-rate. VBR, on the other hand, adjusts the bit-rate according to the “measured quantization error relative to the estimated allowed masking” and other factors to maximize quality while staying very close to the required average bit-rate. One can also specify a quality factor, from 0 to 9, from best to worst, to guesstimate the wanted bit-rate. I prefer to specify the maximum bit-rate explicitly.
So, for each file, let us run the compression using -q 0 (max quality) --vbr-new (the gpsycho-based optimizer) -B rate (where rate is the maximum permissible bit-rate, being 112, 128, 160, 192, 256 or 320, there are other settings but let’s ignore them). For each file and each parameter settings, let us note the absolute resulting file size. We get an array such as (the raw data is in this gnumeric file):
File size vary principally in function of the duration of the original song, so we need to factor this out to figure what’s really going on here. We get:
So, despite minor variations (about 3% from the smallest to the largest file at max bit-rate), all curves rather look the same. There’s a sharp increase in file size from 112 to 160 Kb/s, but after, the growth slows down quite a bit. Some file do not even grow significantly after 192 Kb/s. Some still grow up to 320 Kb/s, but by a minute amount. So those are the effects of --q 0 --vbr-new -B rate.
What about other settings? Like the “automatic” settings using -V? Let us slip --V 0 just after --vbr-new to maximize the quality of the resulting files. Let us rerun the experiment. We get the following:
which, once compensated for duration (original file sizes) gives:
What we see is that it produces, on average, much bigger files than the --vbr setting. Although file size is more predictable at lower bit-rate (there’s almost no variance) the file size at higher bit-rate shows higher variability—a 0.055 difference between the smallest and largest relative file sizes, while we had only a 0.028 difference with the --vbr method.
If the VBR method seems to be producing smaller file sizes of on average than the ABR method, the thing is, there’s no simple way of validating the resulting perceptual audio quality. If I can clearly hear the difference between the 112 and the 320 Kb/s streams, like most people, I can’t really tell the 256 or 320 kb/s or VBR or ABR apart. Since I can’t tell VBR from ABR apart, I’ll go for VBR, since it produces smaller files. And since the resulting file sizes from 256 to 320 kb/s max bit-rate vary rather little, I’ll go for the 320 Kb/s VBR.
What’s your take on this?