But the Burrows-Wheeler transform isn’t the only possible one. There are a few other techniques to generate (reversible) permutations of the input.

If we don’t care really about CPU time, we *could* try each and every one the possible and distinct permutations^{1} and pick the shortest-encoded one. Storing the permutation index (the number of the permutation) will require bits. Although this guarantees that we find the best possible ordering, it is computationally unfeasible, except for very small block lengths.

So we must find something else, something much simpler. Something that shuffles the contents in a parametric (and reversible) way, something that displays each symbol in a block exactly once. One possible way to do that is to use a probe, like we have in hash tables, with linear or quadratic probing. Quadratic probing is cumbersome because you need all kinds of conditions on the table size, and linear probing is too simple.

But if we extend linear probing from a simple +1 step so something like

,

where is relatively prime to the table size , we are sure that all positions in the table will be visited exactly once when varying from 0 to .

For our reordering, we do not know beforehand which step will be the best. Fortunately for us, we only need it to be relatively prime to to be acceptable. Then, we merely try them one by one, something expensive, but not all that much since there will be less than candidates. To pick the best one, we merely need a proxy to the compression ratio, say, a function that counts repetitions or something like that—you would need something more complex if your next step of encoding is more sophisticated.

The search loop would therefore be something like:

//////////////////////////////////////// solution select(const std::string & src) { const size_t l=src.size(); size_t best_step=0,best_score=0; std::string best_remix; for (size_t step=1; step<l; step++) if (gcd(step,l)==1) { std::string this_remix=remix(src,step); size_t this_score=score(this_remix); if (this_score>best_score) { best_step=step; best_score=this_score; best_remix=this_remix; } } // should output best step also return {best_remix,{best_step,best_score}}; }

where `gcd` computes the greatest common divisor, `remix` shuffles the buffer (here a string for display purposes) and `score` computes a proxy to the compression ratio.

*

* *

So let’s try this with actual text:

// Shakespeare: Julius Ceasar, Cassisus, Act 1, scene 2 const std::string cassius="Men at some time are masters of their fates: The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings."; // Shakespare: Twelfth Night, Malvolio, Act 2, scene 5 const std::string malvolio="Some are born great, some achieve greatness, and some have greatness thrust upon 'em."; // https://en.wikiquote.org/wiki/Spinosaurus const std::string horner="If we base the ferocious factor on the length of the animal, there was nothing that ever lived on this planet that could match this creature.";

The score function,

size_t score(const std::string & src) { size_t score=0; char last=0; for (char c: src) { score+=(last==c); last=c; } return score; }

merely count repetitions. The number of repetitions in the original texts is, in order, 0, 2, and 0. So, not that many useful repetitions. Now, after “optimization” the program finds

step = 18 score= 22 0.167939 Mrr,nBtii f,avnor ruua mae ts els,,r oeuse ts eee ouhnmta redmsTurrraataii .ait ltf stluse:Boo n fdttagehuissee ht setsernnw Men at some time are masters of their fates: The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings. step = 36 score= 12 0.141176 Seumgone t ve,nesa euctgooaserrdmghse vsthasphnrmmtt en .rro ba'e ,arsoieeeen aa s Some are born great, some achieve greatness, and some have greatness thrust upon 'em. step = 14 score= 18 0.12766 I tgm eta .ecnis hhehaenatntcrtflawao tu h tateseeetdemasuhhr en eaottegvadrbi hnillc cnftilpu eooo h oswr ,trsci erhloei hffotanvhtt If we base the ferocious factor on the length of the animal, there was nothing that ever lived on this planet that could match this creature.

It found a reordering with 22 repetitions for the first, 12 for the second and 18 for the last. While that doesn’t seem like much, that might still give the next stage compression a chance of getting a few % more compression. Who knows.

*

* *

It is unclear how much extra compression we can get from such as scheme. On the plus side, “decompression” (as shown in the `demix` function in the full source below) is extremely simple and the coding overhead of storing the step in the compressed block is proportional to , which is not necessarily negligible but isn’t excessive either. Maybe there’s something exploitable in the distribution of the steps, further than being drawn from relatively prime numbers to ? Questions for later.

*

* *

The full sourcecode. Click to decollapsulate.

#include <string> #include <vector> #include <iostream> // Shakespeare: Julius Ceasar, Cassisus, Act 1, scene 2 const std::string cassius="Men at some time are masters of their fates: The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings."; // Shakespare: Twelfth Night, Malvolio, Act 2, scene 5 const std::string malvolio="Some are born great, some achieve greatness, and some have greatness thrust upon 'em."; // https://en.wikiquote.org/wiki/Spinosaurus const std::string horner="If we base the ferocious factor on the length of the animal, there was nothing that ever lived on this planet that could match this creature."; typedef std::pair<size_t,size_t> step_score; typedef std::pair<std::string, step_score> solution; //////////////////////////////////////// size_t gcd(size_t a, size_t b) { while (b) { unsigned t=b; b=a % b; a=t; } return a; } //////////////////////////////////////// size_t score(const std::string & src) { size_t score=0; char last=0; for (char c: src) { score+=(last==c); last=c; } return score; } //////////////////////////////////////// std::string remix(const std::string & src, size_t step) { const size_t l=src.size(); std::string temp(l,0); // reserve space for (size_t s=0,d=0;d<l;d++,s=(s+step)%l) temp[d]=src[s]; return temp; } //////////////////////////////////////// std::string demix(const std::string & src, size_t step) { const size_t l=src.size(); std::string temp(l,0); // reserve space for (size_t s=0,d=0;s<l;s++,d=(d+step)%l) temp[d]=src[s]; return temp; } //////////////////////////////////////// solution select(const std::string & src) { const size_t l=src.size(); size_t best_step=0,best_score=0; std::string best_remix; for (size_t step=1; step<l; step++) if (gcd(step,l)==1) { std::string this_remix=remix(src,step); size_t this_score=score(this_remix); if (this_score>best_score) { best_step=step; best_score=this_score; best_remix=this_remix; } } // should output best step also return {best_remix,{best_step,best_score}}; } //////////////////////////////////////// int main() { std::vector<solution> solutions { select(cassius), select(malvolio), select(horner) }; for (std::string & s: std::vector<std::string>{cassius,malvolio,horner}) std::cout << "original score=" << score(s) << std::endl ; for (const auto & s: solutions) std::cout << "step = " << s.second.first << std::endl << "score= " << s.second.second << " " << s.second.second/(float)s.first.size() << std::endl << s.first << std::endl << std::endl << demix(s.first,s.second.first) << std::endl << std::endl << std::endl ; return 0; }

^{1}In fact, the number of distinct permutations is

,

as given by the multinomial coefficients.

Filed under: algorithms, C-plus-plus, data compression, hacks Tagged: Burrows, Burrows Wheeler Transform, compression, hash table, Linear probing, Quadratic probing, Transform, Wheeler ]]>

Well, we know it shouldn’t be too hard since there’s `/dev/urandom` readily available to generate strong random bits. We only need to extract them, and somehow convert them to typable keyboard—meaning typable without resorting to that alt-number or shift-ctrl-u nonsense.

The `/dev/urandom` pseudo-device just spews out random bits as long as you read it, so we must find a way of reading just a certain number of bytes, a number that corresponds to the desired length of the random password. This seems simple enough until you realize you have to filter untypable characters out, and that we must have a rejection-type approach. That is, we read bytes until the wanted number of characters have made their way through. OK, so let’s see how we can filter the characters.

`tr`, one of the commands I love to hate, allows to filter to delete any unwanted characters. But instead of specifying which character to reject, we will define the set of acceptable characters then complement it:

... | tr --delete --complement 0-9a-z

will filter whatever comes through stdin and will let pass only the letters from a to z and digits 0 to 9. We can use a better set later on. Now to have, say, 8 characters, we can use `head`, which we normally use for line-based operations, but which also has a character-oriented mode.

... | tr --delete --complement 0-9a-z | head --bytes 8

will output the 8 first bytes of the stream. Or, more exactly, will read the stream until 8 bytes have been output. We just need to pipe in the random stream:

tr --delete --complement 0-9a-z < /dev/urandom| head --bytes 8

and we’re done. Typical output looks like `p7081txs`.

*

* *

OK, that’s not much of a script, it’s merely a command-line piped command. Let’s wrap that in something more usable. Let’s write a script that takes as optional arguments both length and charset for the password. If no arguments are provided, default charset and length will be used. If one argument is provided, it’s either an integer interpreted as a length (and implied default charset) or a charset name. If two arguments are provided, they must be an integer for the length of the password and a charset name.

#!/usr/bin/env bash usage() { echo usage: echo echo $0 [length] [style] echo echo where echo $'\t'length \(optional, defaults to 8\) is length of password echo $'\t'style \(optional, defaults to full\) is one of simple, hackerish, full } # if argument 1 is provided, check if # is a number, if not, assume default # length and take it as a style case $# in 0) # defaults everything len=8 style=hackerish ;; 1) if [ $1 == "-h" ] then usage exit 0 elif [[ $1 =~ [0-9]+ ]] # is $1 number-like? then len=$1 style=hackerish else len=8 style=$1 fi ;; 2) if [[ $1 =~ [0-9]+ ]] # is $1 number-like? then len=$1 style=$2 else echo first argument must be integer. exit 1 fi ;; esac case "$style" in simple) s=0-9a-z ;; hackerish) s=0-9$!a-zA-Z ;; full) s='~!@#$%^&*()\+_\-0-9a-zA-Z' ;; *) echo unkown style \"$style\" exit 0 ;; esac # spew random password from /dev/urandom tr --delete --complement $s </dev/urandom | head --bytes $len echo

Filed under: Bash (Shell), hacks Tagged: /dev/urandom, Cat, password, random, script, tr ]]>

But this time, I needed something a bit different: I only wanted the sign-extended part. Could I do much better than last time? Turns out, the compiler has a mind of its own.

Long ago, I proposed:

//////////////////////////////////////// int32_t sex(int32_t x) { // this relies on automagic promotion // (which is about the same as // (int32_t)((int16_t)x)) union { int64_t w; struct { int32_t lo, hi; }; // should be hi,lo on a big endian machine } z = { .w=x }; return z.hi; }

This should rely on some variant of `movsx` (move with sign extend), `cdqe` (convert double to quadword in accumulator, i.e., RAX), and a second move instruction to get the high part of the register into another register’s lower part. Well, nope. Disassembly (`g++ -O3 `) reveals

400ed0: 48 63 c7 movsxd rax,edi 400ed3: 48 c1 f8 20 sar rax,0x20 400ed7: c3 ret

The compiler does a shift right of 32 positions to get to the higher part. So that’s basically the same as

int32_t sex_shift(int32_t x) { return (x>>31); }

neglecting the fact that it will overwrite/discard the argument. This disassembles to

400ef0: 89 f8 mov eax,edi 400ef2: c1 f8 1f sar eax,0x1f 400ef5: c3 ret

This variant just propagates the sign bit, using expected signed-shift behavior. On some CPU, that’s fine because the execution time of a shift doesn’t depend on the number of bits shifted, but on some other architecture, that might be *n* cycles per position shifted. That’d be pretty inefficient. So that got me thinking that there ought to be some other way to propagate the sign bit than using shift.

Using a bunch o’shifts like this:

int32_t sex_shift_3(int32_t x) { x&=0x80000000; x|=(x>>1); x|=(x>>2); x|=(x>>4); x|=(x>>8); // maybe some low-level x|=(x>>16); // byte-copy can help? return x; }

Uses a bunch of instructions, most of which are short shifts, some of which could be replaced by register-level movs instructions. This time the compiler doesn’t seem to know what to do with it:

400f00: 89 f8 mov eax,edi 400f02: 25 00 00 00 80 and eax,0x80000000 400f07: 89 c2 mov edx,eax 400f09: d1 fa sar edx,1 400f0b: 09 d0 or eax,edx 400f0d: 89 c2 mov edx,eax 400f0f: c1 fa 02 sar edx,0x2 400f12: 09 d0 or eax,edx 400f14: 89 c2 mov edx,eax 400f16: c1 fa 04 sar edx,0x4 400f19: 09 d0 or eax,edx 400f1b: 89 c2 mov edx,eax 400f1d: c1 fa 08 sar edx,0x8 400f20: 09 d0 or eax,edx 400f22: 89 c2 mov edx,eax 400f24: c1 fa 10 sar edx,0x10 400f27: 09 d0 or eax,edx 400f29: c3 ret

So that’s not that good, is it? We’re not heading in the right direction at all. Let’s see what else we can do:

int32_t sex_shift_4(int32_t x) { return ~((x<0)-1); }

This sets a value, exactly 0 if `x<0` is false, and exactly 1 if it is true. If it is true, ~(1-1)=~0=0xff..ff, if it is false, ~(0-1)=~(-1)=~0xff..ff=0. That’s what we want. However, this disassemble to…

400f30: 89 f8 mov eax,edi 400f32: c1 f8 1f sar eax,0x1f 400f35: c3 ret

Oh g*d d*ammit!

*

* *

This is the perfect example of how “optimizations” are quite relative. relative to the underlying machine and to the compiler. While `~((x<0)-1)` is branchless, and should rely on cute instructions like `test` and `setcc`, the compiler sees through it and replaces it by a shift. On my machine, that’s probably indeed much faster than the alternative, naïve, implementation of the same function. Oh well. Time well wasted, I guess.

Filed under: algorithms, bit twiddling, C, C-plus-plus, C99, CPU Architecture, hacks Tagged: abs, g++, grumpy cat, max, min, sex, sign extension ]]>

Let’s then find out the distribution of the samples. As before, I will use the Toréador file and a new one, Jean Michel Jarre’s Electronica 1: Time Machine (the whole CD). The two are very different. One is classical music, the other electronic music. One is adjusted in loudness so that we can here the very quiet notes as well as the very loud one, the other is adjusted for mostly for loudness, to maximum effect.

So I ran both through a sampler. For display as well as histogram smoothing purposes, I down-sampled both channels from 16 to 8 bits (therefore from 0 to 255). In the following plots, green is the left channel and (dashed) red the right. Toréador shows the following distribution:

or, in log-plot,

Turns out, the samples are Laplace distributed. Indeed, fitting a mean and a parameter agrees with the plot (the ideal Laplacian is drawn in solid blue):

Now, what about the other file? Let’s see the plots:

and in log-plot,

and with the best-fit Laplacian superimposed:

Now, to fit a Laplacian, the best parameters seem to be and . While the fit is pretty good on most of the values, it kind of sucks at the edge. That’s the effect of dynamic range compression, a technique used to limit a signal’s dynamic range, often in a non-uniform way (the signal values near or beyond the maximum value target get more squished). This explains the “ears” seen in the log-plot, also seen in the (not log-)plot.

*

* *

Making the hypothesis that the samples are Laplace-distributed will allow us to devise an efficient quantization scheme for both the limits of the bins *and* the reconstruction value. In S-law, if we remember, the reconstructed value used is the value in the center of the interval. But, if the distribution is not uniform in this interval, the most representative value isn’t in its center. It’s the value that minimizes the squared expected error. Even if the expression for the moments of a Laplace-distributed random variable isn’t unwieldy, we should arrive at a very good, and parametric, quantization scheme for the signal.

Filed under: algorithms, data compression Tagged: 16 bits, 8 bits, Audio Companding, CD Audio, companding, Jean Michel Jarre, Laplace, Loudness, Sample, Time Machine, Toréador ]]>

The first step is to scan the material with your favorite scanner and try to already make a good job of it. Typically, even if you did your best, it’ll end up looking something like:

Which isn’t too bad but 1) is somewhat skewed and 2) shows uninteresting things such as the book’s binding and other background elements. We want to crop that automagically and deskew the image. To deskew the image, I use Fred Weinhau’s textdeskew script, built upon Imagemagick and Awk. Unfortunately, crop isn’t quite automated yet. The script looks like:

#!/usr/bin/env bash for z in *.png do convert $z \ -crop 1625x2600+50+50 \ -gravity center \ -background white \ -extent 1800x2700 \ +repage tmp.png ./text-deskew.sh tmp.png crop-rot-$z done

This produces this image:

Now, if we’re going to bundle the images into a DjVu file, filling the border with white might not be the best course. However, if you rather intend to assemble the image in a somewhat lightweight PDF file, you might want to threshold it into a b/w image. To do so, I use Imagemagick. But Imagemagick is full of idiosyncrasies and is especially finnicky.

Here’s the thresholding script:

#!/usr/bin/env bash base=$(dirname $0) files=($(( for f in $@ do echo $f done ) | sort)) c=0 for z in ${files[@]} do # threshold convert -verbose \ "$z" \ -unsharp 5x5 \ -colorspace gray \ +dither \ -remap ${base}/bw.gif\ -normalize \ -threshold 70% \ png:bw-"${z//.*}.png" done

The script applies an unsharp mask to enhance somewhat details, then uses another image’s palette (a 1×1 pixels GIF containing a 2-color palette) to remap the colors of the original image after normalization and thresholding.

You can download the 1×1 pixels bw.gif here.

At last, we get the final image:

*

* *

These scripts are somewhat brittle, principally because Imagemagick is *really* finicky. The order of the options passed to the `convert` command has to be exactly right, otherwise it won’t work, it’ll do some other random thing. So, use at your own risks, an all that.

Filed under: Bash (Shell), hacks Tagged: Document, ImageMagick, Scanning, thresholding, Unsharp, Unsharp Mask ]]>

There are two main components to the script: 1) finding the files, which is readily done using the dreadfully capricious command `find` (which, methinks, is one of the four horseman of the scriptocalypse alongside `sed`, `awk` and `bc`) and 2) `sort` which as the oxymoronic option `--random-sort`, which, as it implies, randomizes the input lines.

In this particular occasion `find`‘s usage is minimal: a root directory under which the files are to be found and a mask, in this case, `*flac`. To deal with spaces in file-names, `find`‘s output is passed, via a pipe to `sort --random-sort`, whose output is piped to `head` (with an argument) to extract the first *n* files. This output is in turn piped to a subshell that reads, line by line, file-names and applies deflaculation.

#!/usr/bin/env bash location=$1 if [ "$location" != "" ] then nb=${2:-20} pattern=${3:-flac} find $location -iname '*'$pattern \ | sort --random-sort \ | head -n $nb \ | \ ( while read filename do bn=$(basename "$filename") nw=${bn%.*}.wav echo -\> $nw flac --silent --decode "$filename" -o "$nw" done ) else echo must provide location/directory 1>&2 exit 1 fi

Here, the only real trick is to pipe, line by line, file-names to a subshell. This is, maybe, the simplest way to deal with a filename such as “DJ Champion — N⁰1 – 013 – Grand Prix.flac”.

There’s also a Bash substitution pattern, `${bn%.*}` that strips the file’s basename of its extension.

You also might have noticed the unusual patterns in `nb=${2:-20}` and `pattern=${3:-flac}`, which provides default values if the second and third script arguments (`$2` and `$3`) aren’t provided.

Filed under: Bash (Shell), hacks Tagged: bash, DJ Champion, Find, FLAC, head, pipe, script, sort, subshell, WAV ]]>

That’s the idea behind μ-law encoding, or “logarithmic companding”. Instead of quantizing uniformly, we have large (original) values widely spaced but small (original) value, the assumption being that the signal variation is small when the amplitude is small and large when the amplitude is great. ITU standard G.711 proposes the following table:

Range | value spacing | Code |

4063 — 8158 | 256 | 1000xxxx |

2015 — 4062 | 128 | 1001xxxx |

991 — 2014 | 64 | 1010xxxx |

479 — 990 | 32 | 1011xxxx |

223 — 478 | 16 | 1100xxxx |

95 — 222 | 8 | 1101xxxx |

31 — 94 | 4 | 1110xxxx |

1 — 30 | 2 | 1111xxxx |

0 | — | 11111111 |

-1 | — | 01111111 |

-31 — -2 | 2 | 0111xxxx |

-95 — -32 | 4 | 0110xxxx |

-223 — -96 | 8 | 0101xxxx |

-479 — -224 | 16 | 0100xxxx |

-991 — -480 | 32 | 0011xxxx |

-2015 — -992 | 64 | 0010xxxx |

-4063 — -2016 | 128 | 0001xxxx |

-8159 — -4064 | 256 | 0000xxxx |

This table gives a code for about 14 bits per sample, which was deemed enough for toll-quality sound. But what if we want a code for 16 bits? The above table doesn’t quite cover the range. We could reduce the precision of the sample from 16 to 14 (with techniques like those we saw last week) then use G.711 μ-law companding. Or we could redesign the code.

Let’s say we still want a similar code structure with sign, range and value within range, that is:

s | rrr | xxxx |

Let’s also remove the special cases for 0 and -1. Let’s also call it the S-law, because of reasons. We must now adjust the “stretching”. Knowing that the lowest number in a range is 1 more than the largest number in the previous range, we find that we must solve

for . The exact value of , 2.789469…, is rather cumbersome for our needs. Such a value spacing would lead to non-integer values and we kind of want to avoid that. If we round down to two, we get much less than the desired range, and 3 gives us more. Using 3 we the following code:

Range | value spacing | Offset | Code srrrxxxx |

17488 — 52479 | 2187 | 1093 | 0111xxxx |

5824 — 17487 | 729 | 364 | 0110xxxx |

1936 — 5823 | 243 | 121 | 0101xxxx |

640 — 1935 | 81 | 40 | 0100xxxx |

208 — 639 | 27 | 13 | 0011xxxx |

64 — 207 | 9 | 4 | 0010xxxx |

16 — 63 | 3 | 1 | 0001xxxx |

0 — 15 | 1 | 0 | 0000xxxx |

-16 — -1 | 1 | 0 | 1000xxxx |

-64 — -17 | 3 | -1 | 1001xxxx |

-208 — -65 | 9 | -4 | 1010xxxx |

-640 — -209 | 27 | -13 | 1011xxxx |

-1936 — -641 | 81 | -40 | 1100xxxx |

-5824 — -1937 | 243 | -121 | 1101xxxx |

-17488 — -5825 | 729 | -364 | 1110xxxx |

-52480 — -17489 | 2187 | -1093 | 1111xxxx |

This leaves us codes 0x78 to 0x7f and 0xf8 to 0xff for other uses, as they represent values above 32767 and below 32768. The offset is needed for reconstruction: it gives the center value for that cell. A cell that is 3 units wide has its center at 1 (starting at zero, of course), a cell that is 9 units wide has it center at 4, etc.

Let’s see what the code should look like:

//////////////////////////////////////// uint8_t s_law_compress(int16_t s) { if (s<=-17489) return 0x80 | 0x70 | -(s+17489)/2187; if (s<=-5825) return 0x80 | 0x60 | -(s+5826)/729; if (s<=-1937) return 0x80 | 0x50 | -(s+1937)/243; if (s<=-641) return 0x80 | 0x40 | -(s+641)/81; if (s<=-209) return 0x80 | 0x30 | -(s+209)/27; if (s<=-65) return 0x80 | 0x20 | -(s+65)/9; if (s<=-17) return 0x80 | 0x10 | -(s+17)/3; if (s<=-1) return 0x80 | 0x00 | -(s+1); if (s<16) return 0x00 | 0x00 | s; if (s<64) return 0x00 | 0x10 | (s-16)/3; if (s<208) return 0x00 | 0x20 | (s-64)/9; if (s<640) return 0x00 | 0x30 | (s-208)/27; if (s<1936) return 0x00 | 0x40 | (s-640)/81; if (s<5824) return 0x00 | 0x50 | (s-1936)/243; if (s<17488) return 0x00 | 0x60 | (s-5824)/729; /*else*/ return 0x00 | 0x70 | (s-17488)/2187; } //////////////////////////////////////// uint16_t s_law_decompress(int8_t c) { uint8_t range=c & 0xf0; // srrr uint8_t value=c & 0x0f; // vvvv switch (range) { case 0xf0: return -17489-value*2187-1093; // 0xf8--0xff undefined! case 0xe0: return -5825 -value*729 -364; case 0xd0: return -1937 -value*243 -121; case 0xc0: return -641 -value*81 -40; case 0xb0: return -209 -value*27 -13; case 0xa0: return -65 -value*9 -4; case 0x90: return -17 -value*3 -1; case 0x80: return -1 -value -0; case 0x00: return 0 +value +0; case 0x10: return 16 +value*3 +1; case 0x20: return 64 +value*9 +4; case 0x30: return 208 +value*27 +13; case 0x40: return 640 +value*81 +40; case 0x50: return 1936 +value*243 +121; case 0x60: return 5824 +value*729 +364; case 0x70: return 17488+value*2187 +1093; // 0x78--0x7f undefined! } return 0; // unreachable, but needed to suppress warning }

*

* *

I cannot really measure how good the code is subjectively. The only thing I can say about it is that hiss is hardly noticeable, much less than a simple truncation to 8 bits. Well, that’s not that surprising: μ-law and α-law (essentially the same idea but using different constants) have been standardized and used for protocols where the target bit rate was 64 kbits/s (at 8000 samples/s, and 8 bit per sample).

Still, let’s have a look at the effect this companding has on sound. I took Bizet’s “Les toréadors” (most because I needed something I could stand listening to 20 times in a row). The original piece, fresh of the CD looks like:

As one can see, the original has been very carefully mastered to remove any low amplitude high frequencies, that is, *noise*. Random low amplitude high frequencies component would manifest itself as high pitch hiss, and can be noticeable if the music doesn’t mask it.

With a reduction from 16 to 8 bits then back again, the hiss is conspicuous:

The blue/purple/gray texture shown in the spectrogram is akin to white noise, low amplitude in all frequencies. You can hear a distinct hiss. With S-Law, we see that there is a lot of noise:

But: 1) it’s much better at reducing noise with very low amplitude sound, as is shown at the extreme left: the leading “silence” remains (mostly) silent, while with 8 bits downgrading, there’s a fair amount of noise in that area. 2) The total quantity of noise is also much reduced, as is shown by the prevalence of pale blue areas.

*

* *

Of course, 8 bit single-sample companding can’t do miracles. That’s why this type of approach is more or less forgotten and more complex coding techniques are favored. One big difference is that modern waveform-coding techniques will not code individual samples but a bunch of samples together, exploiting all kind of dependencies.

Filed under: algorithms, bit twiddling, data compression, hacks Tagged: alpha-law, companding, ISDN, mu-law, noise, sound, spectrogram, voice, white noise ]]>

So merely shifting the values isn’t sufficient. We must make sure that 0 maps to 0 but 255 maps to 65535. Luckily, we notice that 65535 is divisible evenly by 255, and that 65535/255=257. This gives the conversion pair:

int16_t _8_to_16 (uint8_t x) { return x*257; } // promotion to int uint8_t _16_to_8 (int16_t x) { return x/257; }

That’s nice, but that’s not always exactly the code we need. A peculiarity of the WAV format is that samples on 8 bits are from 0 to 255, but samples on 16 bits are from -32768 to 32767. Therefore, we need to worry about that bias. And, while we’re at it, let’s not trust the compiler to promote the operands to (a 32 bits) int:

int16_t _8_to_16 (uint8_t x) { return ((int32_t)x*257)-(1<<15); } uint8_t _16_to_8 (int16_t x) { return ((int32_t)x+(1<<15))/257; }

Spiffy. Now, what about, say, 16 to 24 and back? 65535 doesn’t stretch as well on 16777215 as 255 did on 65535: 16777215/65535=256.0038…, so it wont be as easy. However…16777215/65535 is also 65793/257:

int32_t _16_to_24(int16_t x) { int64_t t=x; return ((t+(1u<<15))*65793)/257-(1u<<23); } int16_t _24_to_16(int32_t x) { int64_t t=x; return ((t+(1u<<23))*257+256)/65793-(1u<<15); // with a bit of rounding! }

The rounding make the thing perfectly reversible: `_24_to_16(_16_to_24(x))` always gives back `x`, which is nice.

*

* *

You might object—rightfully so—that integer operations such as multiplication an division aren’t free, especially on weak processor, and even more so with weird constants like 65793. OK, let’s do this with additions and shifts then.

int16_t _8_to_16 (uint8_t x) { return ((x<<8)+x)-(1u<<15); } uint8_t _16_to_8 (int16_t x) { return (x+(1u<<15))>>8; } int32_t _16_to_24(int16_t x) { int32_t t=x; t+=(1u<<15); return ((t<<8)+(t>>8))-(1u<<23); } int16_t _24_to_16(int32_t x) { return (x>>8); }

To stretch 0 to 255 on 0 to 65535, we may notice, if we think in hex, that since 255 is 0xff and 65535 is 0xffff, 254, being 0xfe should stretch to 0xfefe (which indeed it does: 254*65535/255=65278=… 0xfefe! To get back the original 8 bit value, we need only the 8 most significant bits, so a shift right by 8 bits does the trick. In the code above, we also take into account the fact that for WAVE files, 8 bits samples are from 0 to 255 while 16 bits samples are from -32768 to 32767.

The conversion from 16 to 24 bits (and back) uses the same trick, copying the most significant bits onto the missing lower bits. Again, quite easily reversible. The code for 16 to 24 bits must compensate for sign before adding the least significant bits (that’s why merely setting the least significant bits with an or wouldn’t work). The code for 24 to 16 bits is even simpler, because the bias is also correctly shifted! Wunderbar.

Filed under: algorithms, bit twiddling, hacks Tagged: 16 bits, 24 bits, 65793, 8 bits, bits per samples, Samples, sound, WAV ]]>

IEEE 754 floats, the most likely implementation of the floats on your computer, aren’t as straightforward as one might think. First, they use a “scientific notation” representation of numbers, with a sign bit, a certain number of bits for the exponent, and some more bits for the mantissa, the “precision bits”. However, since they’re finite-precision numbers, they’re bound to exhibit all kind of error propagation when used in computations.

Even the simple question of the sum of a series, say a vector, of floats isn’t quite that simple. Indeed, we may be tempted to answer the question (how to compute the sum of a series of floats) with something like this:

float simple_sum(const std::vector<float> & v) { return std::accumulate(v.begin(),v.end(),0.0f); }

with `std::accumulate` being defined in `<algorithm>`. It does what you expect: it sets a temporary to zero, then adds one by one the items of the collection to this temporary, and returns it when it’s done. This code is simple, container and type agnostic. It should be fine, except it isn’t.

To understand what’s wrong with this simple code, we must first understand how float addition works. As I mentioned earlier, a float is expressed as

,

where is the sign bit, the exponent (stored as 8 bits) and the mantissa, that keeps only the 23 most significant bits of the number (well, 24, because the most significant bit being always 1 needn’t be stored).

When you add two numbers, if the two exponents are about the same, then the tow mantissas more or less align and the addition works out correctly. Neglecting exponents, and keeping only the 23 most significant bits:

11000010110000001100000 ... + 1111000001010000111000 ... = 11010001110001011100111 ....

This isn’t very different of what we do when we compute long sums on paper; the only difference is that the least significant bits that are missing are taken as zeroes.

However, if the exponents differ somewhat more than 23, we find ourselves in a this situation:

11000010110000001100000 ... + 1111000001010000111000 ... = 11000010110000001100000 ...

and the sum is not the sum of the two numbers, but only the greatest of the two! Now, imagine that in your collection, you have one extremely large number and the rest of the series is composed of smallish numbers. The naive sum isn’t the sum at all: it more or less computes the maximum of the series!

How can we work our way of this? The short answer is “we can’t”. The longer answer is that we must find an order to perform the sum in which only numbers of comparable magnitudes are added together. Sorting them (in increasing order) doesn’t seem like a bad start. Let’s see.

First, let’s create a troublesome array:

#include <cstddef> // size_t #include <vector> #include <random> // for some weird reason, generators and // distributions do not inherit from common // ancestors; template is pretty much the // way to do this. template <typename D,typename G> void fill(std::vector<float> & v, size_t nb, D & distribution, G & generator) { v.resize(nb); for (auto & f : v ) f=distribution(generator); } .... int main() { std::default_random_engine generator; std::exponential_distribution<float> exponential(0.125f); std::vector<float> v; fill(v,1000000,exponential,generator); ... }

This fills the `std::vector` with exponentially distributed values with , what is, with an average of , but with some very large values. This should provide a “troublesome” series.

We’ve already presented the code for the naive sum. Let’s show it again:

float simple_sum(const std::vector<float> & v) { return std::accumulate(v.begin(),v.end(),0.0f); }

This, as mentioned earlier, just do the naive sum. But we now that if the next number to add is too small, it won’t be accounted for. So let’s pursue this idea we had earlier: sort them first then add them:

float sorting_sum(const std::vector<float> & v) { std::vector<float> t(v); std::sort(t.begin(),t.end()); return simple_sum(t); }

So let’s generate 1000000 values and compare the “as-is” sum and the “sorted sum”. Let’s repeat the experiment five times to see what variation there is. We might get something like:

as-is | sorted |

7995938 | 7996406.5 |

8009019.5 | 8009517.5 |

8001316 | 8001631.5 |

8009463.5 | 8009750.5 |

7995282 | 7995591 |

As expected, the sum is close to the average times the number of samples, and, indeed, the sum is always close to 8000000. But they differ! *All* of them!

*

* *

So merely sorting isn’t enough. Hmm… What else could we do? What about adding, pair by pair, iteratively, until only one value remains? What if we do a “reduction” like this:

This should, if the values are evenly distributed, end up adding numbers of comparable magnitude and help precision. A straightforward implementation might look something like this.

float pair_sum(const std::vector<float> & v) { std::vector<float> t(v); size_t l=t.size(); do { size_t d,s; for (d=0,s=0;s+1<l;s+=2,d++) t[d]=t[s]+t[s+1]; // deal with left-overs, if any if (s<l) t[d++]=t[s]; l=d; } while (l>1); return t[0]; }

And, while we’re at it, let’s make a version that sorts (increasing order) the numbers before adding them pair by pair:

float sorted_pair_sum(const std::vector<float> & v) { std::vector<float> t(v); std::sort(t.begin(),t.end()); return pair_sum(t); }

*

* *

Let’s compare the results:

as-is | sorted | pair | sorted pair |

7995938 | 7996406.5 | 7996777.5 | 7996777 |

8009019.5 | 8009517.5 | 8009789 | 8009789 |

8001316 | 8001631.5 | 8001917 | 8001917 |

8009463.5 | 8009750.5 | 8010206 | 8010206 |

7995282 | 7995591 | 7995851 | 7995852 |

If we examine the results, we see that pair-wise sums (either with or without sorting) are larger than the naive sums. This seems to indicate that more numbers are taken into account. But is the pair-by-pair sum really better? Hard to say. To compare correctly, we would need an arbitrary-precision library, but we can think it does a better job. With sorting, it come close to an Huffman-like algorithm where only the two nearest (value-wise) floats are added together. Maybe there are much better strategies. What do you think?

*

* *

The complete code: click to deflapulate.

#include <cstddef> // size_t #include <vector> #include <algorithm> // std::sort, std::accumulate #include <iostream> #include <iomanip> #include <random> // for std::distributions... // for some weird reason, generators and // distributions do not inherit from common // ancestors; template is pretty much the // way to do this. template <typename D,typename G> void fill(std::vector<float> & v, size_t nb, D & distribution, G & generator) { v.resize(nb); for (auto & f : v ) f=distribution(generator); } float simple_sum(const std::vector<float> & v) { return std::accumulate(v.begin(),v.end(),0.0f); } float sorting_sum(const std::vector<float> & v) { std::vector<float> t(v); std::sort(t.begin(),t.end()); return simple_sum(t); } float pair_sum(const std::vector<float> & v) { std::vector<float> t(v); size_t l=t.size(); do { size_t d,s; for (d=0,s=0;s+1<l;s+=2,d++) t[d]=t[s]+t[s+1]; // deal with left-overs, if any if (s<l) t[d++]=t[s]; l=d; } while (l>1); return t[0]; } float sorted_pair_sum(const std::vector<float> & v) { std::vector<float> t(v); std::sort(t.begin(),t.end()); return pair_sum(t); } int main() { std::default_random_engine generator; //std::uniform_real_distribution<float> uniform(0,1.0f); std::exponential_distribution<float> exponential(0.125f); std::cout << std::setprecision(10); for (int i=0;i<20;i++) { std::vector<float> v; fill(v,1000000,exponential,generator); std::cout << std::setw(10) << simple_sum(v) << " " << std::setw(10) << sorting_sum(v) << " " << std::setw(10) << pair_sum(v) << " " << std::setw(10) << sorted_pair_sum(v) << std::endl ; } return 0; }

Filed under: algorithms, C, C-plus-plus, C99, hacks Tagged: float, IEEE 754, Interview, std::accumulate, std::sort ]]>

There are basically three possible cases for the matrix in the equation system

:

- The matrix is square and non singular (its determinant is different from zero). In this case, the equation system is solved by
,

because the inverse of exists.

- The matrix is “tall” (more rows than columns) and its columns are linearly independent. In this case, we will need left pseudoinverse. If is tall (and its columns linearly independent), then the matrix is square and non-singular. It can be inverted. Indeed:
,

,

,

,

,

where is the best , the one that minimizes .

- The matrix is “wide” (more columns than rows) and its rows are linearly independent. This will call for the right pseudoinverse. If is wide (and its rows linearly independent), then the matrix is square and non-singular. It can be inverted. Let’s see:
,

,

,

.

The usage is to denote the pseudoinverse of a matrix as . Here, left and right do not refer to the side of the vector on which we find the pseudo inverse, but on which side of the matrix we find it. As you know, matrix product is not commutative, that is, in general we have . When the matrix is square and non-singular, the normal inverse and the right and left pseudoinverse coincide. We have . Otherwise, depending on whether A is tall or wide, we either have or .

*

* *

So what prompted me to revisit the pseudoinverse like this? Well, I was looking into old notes about quadraphonic sound, and realized that since the “compression” matrix (the one that maps four channels onto two) is “wide”, the derivation used for the channel mixing experiment (the otter story) doesn’t quite work. In fact, it doesn’t at all in this case! I foolishly relied on Mathematica, who found the correct right pseudoinverse, and use that pseudoinverse. The derivation for the *left* pseudoinverse was correct, but I needed the *right* pseudoinverse. Nemo est perfectus.

Filed under: algorithms, Mathematics Tagged: left pseudoinverse, Pseudoinverse, quadraphonic sound, quadraphony, right pseudoinverse ]]>