MessageDigest m=MessageDigest.getInstance("MD5");
StringBuffer sb = new StringBuffer();
if(nodeName!=null) sb.append(nodeName);
if(nodeParentName!=null) sb.append(nodeParentName);
if(nodeParentFieldName!=null) sb.append(nodeParentFieldName);
if(nodeRelationName!=null) sb.append(nodeRelationName);
if(nodeViewName!=null) sb.append(nodeViewName);
if(treeName!=null) sb.append(treeName);
if(nodeValue!=null && nodeValue.trim().length()>0) sb.append(nodeValue);
if(considerParentHash) sb.append(parentHash);
m.update(sb.toString().getBytes("UTF-8"),0,sb.toString().length());
BigInteger i = new BigInteger(1,m.digest());
hash = String.format("%1$032X", i);
The idea behind these lines of code is that we append all the values of a class/model into a StringBuilder and then return the padded hash of that (the Java implementation returns md5 hashes that are lenght 30 or 31, so the last line formats the hash to be padded with 0s).
I can verify that this works, but I have a feeling it fails at one point (our application fails and I believe this to be the probable cause).
Can anyone see a reason why this wouldn’t work? Are there any workarounds to make this code less prone to errors (e.g. removing the need for the strings to be UTF-8).
There are a few weird things in your code.
UTF-8 encoding of a character may use more than one byte. So you should not use the string length as final parameter to the
update()call, but the length of the array of bytes thatgetBytes()actually returned. As suggested by Paŭlo, use theupdate()method which takes a singlebyte[]as parameter.The output of MD5 is a sequence of 16 bytes with quite arbitrary values. If you interpret it as an integer (that’s what you do with your call to
BigInteger()), then you will get a numerical value which will be smaller than 2160, possibly much smaller. When converted back to hexadecimal digits, you may get 32, 31, 30… or less than 30 characters. Your usage of the the"%032X"format string left-pads with enough zeros, so your code works, but it is kind of indirect (the output of MD5 has never been an integer to begin with).You assemble the hash input elements with raw concatenation. This may induce issues. For instance, if
modeNameis “foo” andmodeParentNameis “barqux“, then the MD5 input will begin with (the UTF-8 encoding of) “foobarqux“. IfmodeNameis “foobar” andmodeParentNameis “qux“, then the MD5 input will also begin with “foobarqux“. You do not tell why you want to use a hash function, but usually, when one uses a hash function, it is to have a unique trace of some piece of data; two distinct data elements should yield distinct hash inputs.When handling
nodeValue, you calltrim(), which means that this string could begin and/or end with whitespace, and you do not want to include that whitespace into the hash input — but you do include it, since you appendnodeValueand notnodeValue.trim().If what you are trying to do has any relation to security then you should not use MD5, which is cryptographically broken. Use SHA-256 instead.
Hashing an XML element is normally done through canonicalization (which handles whitespace, attribute order, text representation, and so on). See this question on the topic of canonicalizing XML data with Java.