## Float16

February 12, 2013

The possible strategies for data compression fall into two main categories: lossless and lossy compression. Lossless compression means that you retrieve exactly what went in after compression, while lossy means that some information was destroyed to get better compression, meaning that you do not retrieve the original data, but only a reasonable reconstruction (for various definitions of “reasonable”).

Destroying information is usually performed using transforms and quantization. Transforms map the original data onto a space were the unimportant variations are easily identified, and on which quantization can be applied without affecting the original signal too much. For quantization, the first approach is to simply reduce precision, somehow “rounding” the values onto a smaller set of admissible values. For decimal numbers, this operation is rounding (or truncating) to the $n$th digit (with $n$ smaller than the original precision). A much better approach is to minimize an explicit error function, choosing the smaller set of values in a way that minimizes the expected error (or maximum error, depending on how you formulate your problem).