I’m writting interpreter of language.
There is problem: I want to create type-dictionary, where you can put value of any type by index, that value of any type (simple[int,float,string] or complex[list,array,dictionary] of simple types or of complex of simple types …). That is the same like in python-lang.
What algorithm of hash-function should I use?
For strings there are many examples of hashes – the simplest: sum of all characters multiplied by 31, divided by HASH_SIZE, that simple number.
But for DIFFERENT TYPES, I think, It must be more complicated algorithm.
I find SHA256, but don’t know, how use “unsigned char[32]” result type for adressing in hash-table – it is much more than RAM in computer.
thank you.
There are hash tables in C++11, newest C++ standard – std::unordered_map, std::unordered_set.
EDIT:
Since every type has different distribution, usually every type has its own hash function. This is how it’s done in Java (.hashCode() method inherited from Object), C#, C++11 and many other implementations.
EDIT2:
Typical hash function does two things:
1.) Create object representation in a natural number. (this is what .hashCode() in Java does)
For example – string “CAT” can be transformed to:
2.) Map this number to position in array.
One of the way to do this is:
Where k is a constant (Donald Knuth in his book Art of Programming recommends (sqrt(5)+1)/2), m is size of your hash table and fractional_part and integer_part (obviously) calculate fractional part and integer part of real number.
In your hash table implementation, you need to handle collisions, especially when there are much more possible keys than size of your hash table.
EDIT3:
I read more on the subject, and it looks like
67 * 256^2 + 65 * 256^1 + 84 = 4407636
is really bad way to do hash_code. This is because, “somethingAAAAAABC” and “AAAAAABC” give exactly the same hash code.