I need a specialised hash function h(X,Y) in Java with the following properties.
- X and Y are strings.
- h(X,Y) = h(Y,X).
- X and Y are arbitrary length strings and there is no length limit on the result of h(X,Y) either.
- h(X,Y) and h(Y,X) should not collide with h(A,B) = h(B,A) if X is not equal to A and Y is not equal to B.
- h() does not need to be a secure hash function unless it is necessary to meet the aforementioned requirements.
- Fairly high-performant but this is an open-ended criterion.
In my mind, I see requirements 2 and 4 slightly contradictory but perhaps I am worrying too much.
At the moment, what I am doing in Java is the following:
public static BigInteger hashStringConcatenation(String str1, String str2) {
BigInteger bA = BigInteger.ZERO;
BigInteger bB = BigInteger.ZERO;
for(int i=0; i<str1.length(); i++) {
bA = bA.add(BigInteger.valueOf(127L).pow(i+1).multiply(BigInteger.valueOf(str1.codePointAt(i))));
}
for(int i=0; i<str2.length(); i++) {
bB = bB.add(BigInteger.valueOf(127L).pow(i+1).multiply(BigInteger.valueOf(str2.codePointAt(i))));
}
return bA.multiply(bB);
}
I think this is hideous but that’s why I am looking for nicer solutions. Thanks.
Forgot to mention that on a 2.53GHz dual core Macbook Pro with 8GB RAM and Java 1.6 on OS X 10.7, the hash function takes about 270 micro-seconds for two 8 (ASCII) character Strings. I suspect this would be higher with the increase in the String size, or if Unicode characters are used.
Today I’ve decided to add my solution for this hash function problem. It was not tested very good and I did not measure its performance, so you can feed me back with your comments. My solution is situated below:
}
I suppose that my solution should not produce any collision for single string with length less then 32 (to be more precise, for single string with length less then
hash_sizevariable value). Also it is not very easy to find collisions (as I think). To regulate hash conflicts probability for your particular task you can try another prime numbers instead of7and31inINITIAL_HASHandHASH_MULTIPLIERvariables. What do you think about it? Is it good enought for you?P.S. I think that it will be much better if you’ll try much bigger prime numbers.