How can I detect the small differences between two strings with the MD5 algorithm? I want to find the percentage of similarity between a few large strings. As how can I check the difference since :
MD5("The quick brown fox jumps over the lazy dog.")
= e4d909c290d0fb1ca068ffaddf22cbd0
MD5("The quick brown fox jumps over the lazy dog")
= 9e107d9d372bb6826bd81d3542a419d6
Can you give me a solution to this one or give me another hash algorithm that can be used effectively in large strings or large documents?
If the strings are really long (like entire, possibly large, files) you can break them up into pieces, hash the pieces, and check how many match. That’s not entirely dependable though.
If it says most of two strings are identical, that’ll probably be accurate. Unless you do quite a bit more to maintain synchronization, it can indicate large differences when the two are nearly identical though. Just for example, if you do it naively, inserting a single byte at the beginning of one string could indicate that the strings are entirely different, even though there’s really only one byte that’s different.