I am trying to generate non-sequential human readable order codes derived from (lets say) a unsigned 32bit internal id that starts at 1 and is auto incremented for each new order.
In my example code below, will every $hash be unique? (I plan to base34 encode the $hash to make it human readable.)
<?php
function int_hash($key) {
$key = ($key^0x47cb8a8c) ^ ($key<<12);
$key = ($key^0x61a988bc) ^ ($key>>19);
$key = ($key^0x78d2a3c8) ^ ($key<<5);
$key = ($key^0x5972b1be) ^ ($key<<9);
$key = ($key^0x2ea72dfe) ^ ($key<<3);
$key = ($key^0x5ff1057d) ^ ($key>>16);
return $key;
}
for($order_id = 1; $order_id <= PHP_INT_MAX; ++$order_id) {
$hash = int_hash($order_id);
}
?>
If not, are there any suggestions on how to replace int_hash?
The result of say, base34 encoding a md5($order_id) is too long for my liking.
Almost. (Which, I guess, means “no, but in a way that’s easily fixed”.) Your function consists of a sequence of independent steps; the overall function is bijective (reversible) if and only if every single one of those steps is. (Do you see why?)
Now, each step has one of the following forms:
with
NUM_BITS != 0.We can actually treat these as variants of a single form, by viewing the former as almost equivalent to this:
So all we need is to show that this:
is bijective. Now, XOR is commutative and associative, so the above is equivalent to this:
and
(x ^ y) ^ y == x ^ (y ^ y) == x ^ 0 == x, so clearly XOR-ing with a constant value is reversible (by re-XOR-ing with the same value); so all we have to show is that this is bijective:whenever
NUM_BITS != 0.Now, I’m not writing a rigorous proof, so I’ll just give a single reasoned-out example of how to reverse this. Suppose that
$key ^ ($key << 9)isHow do we obtain
$key? Well, we know that the last nine bits of$key << 9are all zeroes, so we know that the last nine bits of$key ^ ($key << 9)are the same as the last nine bits of$key. So$keylooks likeso
$key << 9looks likeso
$keylooks like(by XOR-ing
$key ^ ($key << 9)with$key << 9), so$key << 9looks likeso
$keylooks likeso
$key << 9looks likeso
$keylooks likeSo . . . why do I say “almost” rather than “yes”? Why is your hash-function not perfectly bijective? It’s because in PHP, the bitwise shift operators
>>and<<are not quite symmetric, and while$key = $key ^ ($key << NUM_BITS)is completely reversible,$key = $key ^ ($key >> NUM_BITS)is not. (Above, when I wrote that the two types of steps were “almost equivalent”, I really meant that “almost”. It makes a difference!) You see, whereas<<treats the sign bit just like any other bit, and shifts it out of existence (bringing in a zero-bit on the right),>>treats the sign bit specially, and “extends” it: the bit that it brings in on the left is equal to the sign bit. (N.B. Your question mentions “unsigned 32bit” values, but PHP doesn’t actually support that; its bitwise operations are always on signed integers.)Due to this sign extension, if
$keystarts with a0, then$key >> NUM_BITSstarts with a0, and if$keystarts with a1, then$key >> NUM_BITSalso starts with a1. In either case,$key ^ ($key >> NUM_BITS)will start with a0. You’ve lost exactly one bit of entropy. If you give me$key ^ ($key >> 9), and don’t tell me whether$keyis negative, then the best I can do is compute two possible values for$key: one negative, one positive-or-zero.You perform two steps that use right-shift instead of left-shift, so you lose two bits of entropy. (I’m hand-waving slightly — all I’ve actually demonstrated is that you lose at least one bit and at most two bits — but I’m confident that, due to the nature of the steps between those right-shift steps, you do actually lose two full bits.) For any given output value, there are exactly four distinct input-values that could yield it. So it’s not unique, but it’s almost unique; and it’s easily fixed, by either: