I’m learning about hash tables, trying to understand how they work. I’d like to make a rather simple hash table with separate chaining (using lists in an array). I have a few questions:
-
Assuming the keys can be of any type, I would require the user to implement the hashing function, right? Can this be avoided?
-
The user also needs to supply the length of the array that contains the lists (for collisions) at initialisation, correct? Can this be avoided?
If you have any other tips, or maybe some clear code samples in C++ of a hash table I would be thankful.
Thanks for your help.
Typically, yes, you would need the client to specify the hash function, since if you are writing a generic hash table and are operating on an arbitrary type T, you can’t know how to hash it in a way that is semantically meaningful. You can do this by parameterizing the class on both the type of the elements being stored and the hash function. For example:
Here, you can use default arguments with the templates to select the default hash function unless the user specifies otherwise.
While the client typically can specify the initial size of the table, it’s not required. You could make an educated guess about the number of buckets (say, initially use 17 buckets), growing the table as the load factor increases. This is similar, say, to how
std::vectorworks: the implementation can pick a default size, but if the client either explicitly asks for a presized vector or callsreserve, the implementation takes the hint from the user. For example, you could have a constructor of the formThis way, the client can just construct a hash table with a default number of buckets, or if they have a sense of the number of buckets they’d like they can specify it as a parameter. However, you might also want to hide the buckets as a detail and just have the client specify how many elements they’re expecting to put into the table, then have your class do a computation behind the scenes for this. This makes it easier to switch implementations behind-the-scenes, so that if you want to use something like a dynamic perfect hash table instead of chained hashing the class can handle the complexity of computing an initial size.
As for code examples, I’m not sure how to provide any without giving away a lot of the complexity involved in building the hash table. 🙂 If there’s a specific piece of code you’re interested in and are having trouble writing on your own, feel free to post it as a separate question so that you can get more targeted feedback.
Hope this helps!