I have a large collection of unique strings (about 500k). Each string is associated with a vector of strings. I’m currently storing this data in a
map<string, vector<string> >
and it’s working fine. However I’d like the look-up into the map to be faster than log(n). Under these constrained circumstances how can I create a hashtable that supports O(1) look-up? Seems like this should be possible since I know all the keys ahead of time… and all the keys are unique (so I don’t have to account for collisions).
Cheers!
You can create a hashtable with
boost::unordered_map,std::tr1::unordered_mapor (on C++0x compilers)std::unordered_map. That takes almost zero effort. Google sparsehash may be faster still and tends to take less memory. (Deletion can be a pain, but it seems you won’t need that.)If the code is still not fast enough, you can exploit prior knowledge of the keys with a minimal perfect hash, as suggested by others, to obtain guaranteed O(1) performance. Whether the code generating effort that takes is worth it depends on you; putting 500k keys into a tool like
gperfmay take a code generator generator.You may also want to look at CMPH, which generates a perfect hash function at run-time, though through a C API.