Here’s 3 example md5 hashes
$ md5 -s "1" && md5 -s "2" && md5 -s "3"
MD5 ("1") = c4ca4238a0b923820dcc509a6f75849b
MD5 ("2") = c81e728d9d4c2f636f067f89cc14862c
MD5 ("3") = eccbc87e4b5ce2fe28308fd9f2a7baf3
Say I wanted to take 8 characters from any hash. Is the beginning part of the hash particularly more “random” than the end? middle? Or are all substrings equally “random”?
I was curious myself, so I went ahead and wrote a program to test this. You’ll need Crypto++ to compile the code.
Disclaimer:
When it comes to cryptography, or even just mathematics in general, I know just enough to shoot myself in the foot. So, take the following results with a grain of salt and keep in mind that I only have a cursory knowledge of the tools I’m using.
I only sampled three substrings: the first 8 bytes, the middle 8 bytes, and the last 8 bytes. Long story short, they’re equally random.
However, when using a smaller sample space, it appears as if the last 8 bits are slightly more random. The larger the sampling space, the closer all three substrings approach complete randomness.
1000 iterations:
5000 iterations:
10000 iterations:
30000 iterations:
“Randomness” is measured by Crypto++’s MaurerRandomnessTest class. For reference, the executable compiled from the above code has a randomness value of
0.632411and a copy of Shakespeare’s Macbeth downloaded from Project Gutenburg has a randomness value of0.566991.