In a previous post, I told you about a short script to rip and encode CDs using Flac, and I discussed a bit about how LPC works. In this post, let us have a look on how efficient Flac is.
Let us use a quantitative approach to this. Since I have a great number of songs, we can use statistics to give us a good idea of what kind of compression we can expect.
I have 4461 Flac-encoded songs, so this should cover the “more than two data points” part of the experiment. Further, they are of wildly different styles, ranging from opera to rave; with pretty much everything in between, so this covers the “varied enough” part of the experiment.
So let’s have a look at compression. We will observe the compression factor, which is what the flac commands returns upon compressing a file. A factor of 0.15 means that the compressed file is 15% the size of the original, raw, audio file.
(Raw CD audio is 16 bits per samples, two channels, and 44100 samples per second. One minute of CD audio represents 5292000 samples, for 84672000 bits, which is roughly 10MB.)
Examining the 4461 songs, we find that:
minimum | 0.143 |
average | 0.546 |
median | 0.568 |
mode | 0.630 to 0.654 |
maximum | 0.825 |
So we have a best case where files are compressed roughly 7:1 (at 0.143), a worst case around 80%, and most of the files lying somewhere in the 0.5 to 0.7 range. Let us have a look using a box plot:
Hmm. Box plots are usually using measures, so the bar in the middle is the median, and not the mode nor the average. An histogram may give us more information:
Or maybe we should line up both?:
(and now you should take the time to look at the correspondences between the histogram and the box plot.)
So, in summary: while the average is about 0.55, we see that we have a lot a files that compress a lot better, but a also a good number that lie in the 0.6 to 0.7 interval.
*
* *
It would have been interesting to compare with the MP3s, but it turns out that the original collection was disparate, and I have yet to finish my script to convert from Flac to MP3 (while keeping the tags, evidently) for export to non-Flac capable devices. I would suspect that a very high-bit rate mp3 would be only 2-3 times smaller than the corresponding Flac file. That may be the topic of another blog post.
*
* *
The script to scan the files is rather not complicated. It turns out that the metaflac command allows us to get the meta-data from the .flac file and compute the compression ratio. The only tricky bit of the script (if we can say so) is using bc, the “arbitrary precision calculator.”
So it goes:
<br><br> #!/usr/bin/env bash find /home/steven/data/musique/ -name \*.flac \ | \ ( c=1 while read filename do # get meta-data from flac file meta_data=$(metaflac --show-bps --show-channels --show-total-samples "$filename") # get actual file-size file_size=$(stat -c '%s' "$filename") # compute the raw size (channels*bits_per_samples*total_samples)/8 # (+7 is for rounding, should it happens) # original_size=$(echo \(${meta_data[@]}+7\)/8 | tr ' ' '*' | bc) # show data and compression ratio (with leading zero) echo $c $original_size $file_size 0$(echo scale=5\;$file_size/$original_size | bc) ((c++)) done )
[…] 4461曲のファイルサイズを調べたサイトによりますと、平均54.6%、中央値56.8%との結果もあり、FLACへエンコードすることで約60%前後に圧縮されるのが実状のようです。 […]