## Mild Obfuscation

Sometimes, you have a small bit of data, may something like a GUID (for which there are many possible solutions), that you may have to store in a plain-text file, nothing crucial, not sensitive, but that you don’t really want your users to poke with, even if they really mean to. In such cases, you could use encryption, but it may be that mild obfuscation is quite sufficient and dissuasive. So, if you don’t really want strong encryption, what can you do to provide a machine-efficient encryptionnette?

OK, let me repeat this one more time: the premise is that we do not need top-notch encryption, just a way to keep the user from messing with a value stored in a plain-text file. Of course, if you’re doing open-source, he will be able to do so more easily, but the point is that you want to make sure that it’s kind of not really worth the trouble to mess with that value.

So let’s say that your GUID is built from the machine’s first MAC address and a strong check-sum, something like a CRC, yielding 64 bits strings: 16 for the check-sum, and 48 for the MAC Address.

So what are the easily invertible operations one can apply to a 64-bits number to yield an (apparently) uncorrelated number?

We can try to multiply by a constant $a$ such that $a(ax\:\mathrm{mod}\:m)\:\mathrm{mod}\:m=x$.

In other words, find $a$ such it is its own inverse modulo $m$. Typically, $m=2^n$ for machine-efficiency reasons, but $m$ can be any value larger than 1. For some $m$, only the only solutions are $\equiv\pm{}1$, for others, you’ll have a couple more solutions. If $m=2^n$, then $a=2^{n-1}\pm{}1$ is a solution. For $m=2^{64}$, 0x7fffffffffffffff and 0x8000000000000001 are solutions. Interestingly enough, these multipliers seem to randomize the input quite nicely. For example, using $m=2^{32}$ and $a=2^{31}-1$, we get: $x$ $ax\:\mathrm{mod}\:m$ 3d1b58ba c2e4a746 507ed7ab 2f812855 2eb141f2 d14ebe0e 41b71efb 3e48e105 79e2a9e3 061d561d 7545e146 8aba1eba 515f007c aea0ff84 5bd062c2 a42f9d3e 12200854 eddff7ac 4db127f8 b24ed808

…which looks randomizing enough. The inverse is to multiply by $a$ again, which makes it a symmetric operation. The integer multiplication is not that expensive on a modern processor, but it can be quite involved for a micro-controller class CPU (even if it’s capable of 32-bits arithmetic).

*
* *

We can xor our data with a fixed random mask. Pick any 64-bits number using a pseudo-random number generator), not necessarily a strong one. Since the mask is fixed, there’s no point of having a very strong random number; it just has to be random enough so that when you xor it with your data, it looks more random than it used to. Of course, this is clearly not impervious to a known plain-text attack, it suffice to send a zeroed vector to retrieve the key.

*
* *

We can use a bit blender. A bit blender, at least how I defined it in a previous post, is an operation that move bits around in a lossless fashion—that is, the operation is reversible. That is, it applies an arbitrary permutation of the input bits: it can reverse the vector, apply a rotation, move blocks around, etc. For the application we are considering here, a blender that disperses the bits randomly (but reversibly) is preferable. One simple operation that does that rather well is the perfect shuffle—an operation known to card-game players as the Faro Shuffle.

The following diagram shows the shallowest network that produces the perfect shuffle for 8 elements: A simple Python implementation of the perfect shuffle would look like:

def perfect_shuffle(a):
"""perfectly shuffles the supplied list,
throws AssertionError if list has odd length."""
l=len(a)

if l % 2:
raise AssertionError
else:
h=l/2; # half-length

# variant proposed by Kirk McDonald
# (from the #python channel on Freenode)
return [item for pair in zip(a[:h],a[h:]) for item in pair]

def perfect_unshuffle(a):
"""perfectly unshuffles the supplied list,
if previously perfectly shuffled."""
return a[::2]+a[1::2]


It will take a list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and yield [0, 5, 1, 6, 2, 7, 3, 8, 4, 9]. Note that the shuffled list contains all of the original list’s items. The transformation is also reversible, and we can recover [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] from [0, 5, 1, 6, 2, 7, 3, 8, 4, 9].

A C/C++ implementation is a bit more involved. Well, a whole lot.

uint32_t perfect_shuffle(uint32_t x)
{
x=(x & UINT32_C(0xff0000ff)) | ((x & UINT32_C(0x00ff0000)) >> 8) | ((x & UINT32_C(0x0000ff00)) << 8);
x=(x & UINT32_C(0xf00ff00f)) | ((x & UINT32_C(0x0f000f00)) >> 4) | ((x & UINT32_C(0x00f000f0)) << 4);
x=(x & UINT32_C(0xc3c3c3c3)) | ((x & UINT32_C(0x30303030)) >> 2) | ((x & UINT32_C(0x0c0c0c0c)) << 2);
x=(x & UINT32_C(0x99999999)) | ((x & UINT32_C(0x44444444)) >> 1) | ((x & UINT32_C(0x22222222)) << 1);

return x;
}

uint32_t perfect_unshuffle(uint32_t x)
{
x=(x & UINT32_C(0x99999999)) | ((x & UINT32_C(0x44444444)) >> 1) | ((x & UINT32_C(0x22222222)) << 1);
x=(x & UINT32_C(0xc3c3c3c3)) | ((x & UINT32_C(0x30303030)) >> 2) | ((x & UINT32_C(0x0c0c0c0c)) << 2);
x=(x & UINT32_C(0xf00ff00f)) | ((x & UINT32_C(0x0f000f00)) >> 4) | ((x & UINT32_C(0x00f000f0)) << 4);
x=(x & UINT32_C(0xff0000ff)) | ((x & UINT32_C(0x00ff0000)) >> 8) | ((x & UINT32_C(0x0000ff00)) << 8 );

return x;
}

uint64_t perfect_shuffle(uint64_t x)
{
uint16_t *y=(uint16_t*)&x;
std::swap(y,y);

return ((uint64_t)perfect_shuffle((uint32_t)(x >> 32)) << 32)
| perfect_shuffle((uint32_t)x);
}

uint64_t perfect_unshuffle(uint64_t x)
{
x=((uint64_t)perfect_unshuffle((uint32_t)(x >> 32)) << 32)
| perfect_unshuffle((uint32_t)x);

uint16_t *y=(uint16_t*)&x;
std::swap(y,y);

return x;
}


How this works is explained here in great detail.

The point is that the perfect shuffle, like when you’re shuffling cards, disperses bits around quite a bit. But reversibly so.

*
* *

So what if we were to combine all these operations? that would be the best order? Since they are each reversible, any order could be applied: xor-mul-shuffle, or mul-shuffle-xor, or mul-xor-shuffle, or…

The mul-xor-shuffle test code would look something like:

  // reversibility test
std::cout << std::hex;
const uint64_t mult=UINT64_C(0x7fffffffffffffff);
uint64_t mask=((uint64_t)std::rand() << 32) | std::rand();
for (int i=0; i<10000000; i++)
{
uint64_t u=((uint64_t)std::rand() << 32) | std::rand(); // sure!
if (unobfuscated_u != u)
{
std::cout
<< "oh noes!!!"
<< u << " "
<< unobfuscated_u
<< std::endl;
return 1;
}

}


With a mask of 1888600f3e744fe9 (generated randomly), we get: $x$ obfuscated $x$ 3579ad956a6116a e9350c5144231dd7 59400ebf61afd9f7 4ea80e3a964374aa 606c6ffe5916d626 c16a43dbbe1405ad 4094d7a77661edc0 cd3ffc4e31d1666b 6440029e149e048d 511b0b3bc7b269ec 72fc9db1258a1eb7 5632808b445c6402 7d1618fa4b6b5ec9 42cc7c02d47e51dc 171a50cd17b3e4ca fb142de2b1ba5bf7 00a8dfc459c13348 e96ab2ee60051b21 27686abb39843dd0 f54002ffea7371cb

OK, listing a list of numbers that kind of looks like random is a proof by volume, but you can experiment by yourselves.

*
* *

I am sure you have thought, reading this entry, of many more ways of mildly obfuscating a n-bit word, and that’s fine. The main point is the operation must be randomizing but reversible. If it’s not reversible, it’s merely a hash, and that’s not what we wanted.

Can you think of other reversible randomizing operations?

### 4 Responses to Mild Obfuscation

1. (more) Mild Obfuscation « Harder, Better, Faster, Stronger says:

[…] Mild Obfuscation, I […]

2. mtrench says:

Hello Steven,

your first table (x -> a*x mod m) seems to be using m=2^32, contrary to your statement that m=2^32-1.

Great post!

• Steven Pigeon says:

You’re absolutely right! (I was thinking & 0xffffffff). I will correct this.

3. Three (somewhat) Better Functions (Hash functions, part V) | Harder, Better, Faster, Stronger says:

[…] discussed moving bits around before. The first function (that I called “blender”) moves bits around following this […]