Scanning documents or books without expensive hardware and commercial software can be tricky. This week, I give you the script I use to clean up a scanned image (and eventually assemble many of them into a single PDF document).
This week, something short. To run tests, I needed a selection of WAV files. Fortunately for me, I’ve got literally thousands of FLAC files lying around on my computer—yes, I listen to music when I code. So I wrote a simple script that randomly chooses a number of file from a directory tree (and not a single directory) and transcode them from FLAC to WAV. Also very fortunately for me, Bash and the various GNU/Linux utilities make writing a script for this rather easy.
Computationally inexpensive sound compression is always difficult, at least if you want some quality. One could think, for example, that taking the 8 most significant bits of 16 bits will give us 2:1 (lossy) compression but without too much loss. However, cutting the 8 least significant bits leads to noticeable hissing. However, we do not have to compress linearly, we can apply some transformation, say, vaguely exponential to reconstruct the sound.
That’s the idea behind μ-law encoding, or “logarithmic companding”. Instead of quantizing uniformly, we have large (original) values widely spaced but small (original) value, the assumption being that the signal variation is small when the amplitude is small and large when the amplitude is great. ITU standard G.711 proposes the following table: