A positional number system needs a base that is either greater than one, or smaller than minus one—yes, we can have a negative base for a number system. The system, however, seems to break down if the base we chose is base 1.
If the base is 1, then there are no permissible digits since the digits , in a base
system, must be
. But we can still represent numbers using just 1s. That's the unary numeral system, and numbers are just represented as repeated 1s. 15? Fifteen ones: 111111111111111. Operations? Not very complicated, just… laborious.
Indeed:
- Addition of two unary numbers is straightforward: you just concatenate the two series of ones.
- Subtraction of two unary numbers. Suppose
and
are two numbers. The operation
(supposing
) is just that for each 1 in
, you remove a 1 in
.
- Multiplication of two unary numbers. Again, let
and
be two numbers. We proceed by iterated addition. Let
be zero (no ones, the empty number?). While
, concatenate
to
, remove one 1 from
.
- Division of two unary numbers. Let
and
be zero. While
, subtract
from
, add 1 to
.
will be the quotient, and
the remainder.
None of these operations seems particularly complicated, but they are all very tedious. Adding or subtracting 1 from an unary number is merely adding or removing one 1 at the end, but adding two arbitrary numbers means copying a whole lot of ones. Subtracting is also not as straightforward as we might hope. On paper, we could align both numbers and easily see if one is larger than the other and by how much. In the computer, however, we must scan both series of ones (assuming we can’t know lengths otherwise) to figure out exactly how much is left.
** *
So what’s the use of unary encoding? Well, under certain circumstances, you can use it to encode integers using a variable length code. Well, almost. First, we must notice that in reality, “unary” encoding still uses two symbols. True, there’re the ones, but there’s also the space, the delimiter, that signals the end of the string. On paper, this extra “character” is blank paper, but it does contain information. In the computers’ memory, we’ll use 0 as a delimiter. The number 7 in “unary” becomes 11111110.
That’s actually an efficient code when numbers are distributed exponentially, that is, .
Hi. Great and simple post.
Can you please explain the last line:
That’s actually an efficient code when numbers are distributed exponentially, that is, P(x=n)\propto 2^{-n}.
If
, (with
) then you have 50-50% chance that n=1, and the code is 0, if not, you have one of two chances that n=2, and its code is 10, … n=3 has code 110, n=4 has code 1110, etc.
And since Shannon’s theorem states that a code is optimal whenever
is minimized, a code that uses n bits to code number n is optimal if
.