For my C++ application, there is a requirement to check if a word is a valid English dictionary word or not. What is the best way to do it. Is there freely available dictionary that I can make use of. I just need a collection of all possible words. How to make this lookup least expensive. Do I need to hash it.
Share
With regards to the presence of a word list, it depends on the platform.
Under Linux,
/usr/share/dict/wordscontains a list of English wordsthat might meet your needs. Otherwise, there are doubtlessly such lists
available on the network.
Given the size of such lists, the most rapid access will be to load it
into a hash table.
std::unsorted_set, if you have it; otherwise, manyC++ compilers come with a
hash_set, although different compilers havea slightly different interface for it, and put it in different
namespaces. If that still has performance problems, it’s possible to do
better if you know the number of entries in advance (so the table never
has to grow), and implement the hash table in an std::vector (or even a
C style array); handling collisions will be a bit more complicated,
however.
Another possibility would be a trie. This will almost certainly result
in the least number of basic operations in the lookup, and is fairly
simple to implement. Typical implementations will have very poor
locality, however, which could make it slower than some of the other
solutions in actual practice (or not—the only way to know is to
implement both and measure).