I am getting a string hash like this:
string content = 'a very long string'; int contentHash = content.GetHashCode();
I am then storing the hash into a dictionary as key mapping to another ID. This is useful so I don’t have to compare big strings during default dictionary hash computation but I can just fish the ID from the dictionary by key.
Can I be sure that the hash for a given string (‘a very long string’) will be always the same?
Can I be sure that two different strings won’t have the same hash?
Also, if possible, how likely is it to get the same hash for different strings?
Just to add some detail as to where the idea of a changing hashcode may have come from.
As the other answers have rightly said the hashcode for a specific string will always be the same for a specific runtime version. There is no guarantee that a newer runtime might use a different algorithm perhaps for performance reasons.
The String class overrides the default GetHashCode implementation in object.
The default implementation for a reference type in .NET is to allocate a sequential ID (held internally by .NET) and assign it to the object (the objects heap storage has slot for storing this hashcode, it only assigned on the first call to GetHashCode for that object).
Hence creating an instance of a class, assigning it some values then retrieving the hashcode, followed by doing the exact same sequence with the same set of values will yeild different hashcodes. This may be the reason why some have been led to believe that hashcodes can change. In fact though its the instance of a class which is allocated a hashcode once allocated that hashcode does not change for that instance.
Edit: I’ve just noticed that none of the answers directly reference each of you questions (although I think the answer to them is clear) but just to tidy up:-
In your usage, yes.
No. Two different strings may have the same hash.
The probability is quite low, resulting hash is pretty random from a 4G domain.