I’ve been tasked with implementing an XOR hash for a variable length binary string in Perl; the length can range from 18 up to well over 100. In my understanding of it, I XOR the binary string I have with a key. I’ve read two different applications of this online:
- One of the options is if the length of my key is shorter than the string, I divide up the string into blocks that are the length of the key; these are then all folded together (so the length of the resulting hash would be the length of the key).
- I’ve also read that you just XOR the key across each key-length block of the string (so the resulting hash would be the length of string).
Is one of these more correct than the other? This is for hashing values in an index, so I’m inclined to think the first option (which could produce shorted hashes) would be better.
Finally, is there a good way to generate a sufficiently random key? And is there a good length to choose for the key based on the length of the strings to be hashed?
EDIT: By the way, I am very aware of how bad this hash works. It’s strictly for comparison purposes. 🙂
One other alternative, from here (search for XOR hashing).
Assuming the hash is supposed to be x bytes long, break the message into blocks of x bytes; and xor them together. This is effectively the same as using method 1 with a key of x 0’s. (or, alternatively, starting with a key of the first x bytes of the string, and ignoring those first bytes of the string. All manner of fun ways to think about it)
(Also note what is said about XOR hashing, namely that it is bad. Very bad.) (Roughly. It’s better then alternatives, but it is not sufficient for a lot of what hashing is used for)
EDIT: One other small thing; if method 1 uses the same key across all binary strings that are hashed; then it doesn’t really matter what the key is. xor’ing against a constant is akin to, say, ROT13.
<sarcasm>Alternatively, if you use SHA1 to derive a key per string… that might make the XOR hash much better.</sarcasm>