EDIT: 64 or 128 bit would also work. My brain just jumped to 32bit for some reason, thinking it would be sufficient.
I have a struct that is composed of mostly numeric values (int, decimal), and 3 strings that are never more than 12 alpha-characters each. I’m trying to create an integer value that will work as a hash code, and trying to create it quickly. Some of the numeric values are also nullable.
It seems like BitVector32 or BitArray would be useful entities for use in this endevor, but I’m just not sure how to bend them to my will in this task. My struct contains 3 strings, 12 decimals (7 of which are nullable), and 4 ints.
To simplify my use case, lets say you have the following struct:
public struct Foo
{
public decimal MyDecimal;
public int? MyInt;
public string Text;
}
I know I can get numeric identifiers for each value. MyDecimal and MyInt are of course unique, from a numerical standpoint. And the string has a GetHashCode() function which will return a usually-unique value.
So, with a numeric identifier for each, is it possible to generate a hash code that uniquely identifies this structure? e.g. I can compare 2 different Foo’s containing the same values, and get the same Hash Code, every time (regardless of app domain, restarting the app, time of day, alignment of Jupiters moons, etc).
The hash would be sparse, so I don’t anticipate collisions from my use cases.
Any ideas? My first run at it I converted everything to a string representation, concated it, and used the built-in GetHashCode() but that seems terribly … inefficient.
EDIT: A bit more background information. The structure data is being delivered to a webclient, and the client does a lot of computation of included values, string construction, etc to re-render the page. The aforementioned 19 field structure represent a single unit of information, each page could have many of units. I’d like to do some client-side caching of the rendered result, so I can quickly re-render a unit without recomputing on the client side if I see the same hash identifier from the server. JavaScript numeric values are all 64 bit, so I suppose my 32bit constraint is artificial and limiting. 64 bit would work, or I suppose even 128 bit if I can break it into two 64 bit values on the server.
Well, even in a sparse table one should better be prepared for collisions, depending on what “sparse” means.
You would need to be able to make very specific assumptions about the data you will be hashing at the same time to beat this graph with 32 bits.
Go with SHA256. Your hashes will not depend on CLR version and you will have no collisions. Well, you will still have some, but less frequently than meteorite impacts, so you can afford not anticipating any.