I have an md5 function which i have confirmed to work well for both files and strings. But when i use it on variable sized chunks of very large files it generates md5 values which are the same but the size of the chunks is different.
I wonder if there is a probability that two chunks with different lengths but may be with the same content result in similar md5 fingerprints.
The odds that this happens is 1 / (2^128), since MD5 is a 128-bit hash. That means 1/(3.4 x 10^38), so it’s very unlikely but not impossible.
It’s more probable, I think, that you’re doing something wrong and you are actually calculating the MD5 of the same text/file every time.