I’m writing an application that needs to read a list of strings from a file, save them in a data structure, and then look up those strings by prefixes. The list of strings is simply a list of words in a given language. For example, if the search function gets “stup” as a parameter, it should return [“stupid”, “stupidity”, “stupor”…]. It should do so in O(log(n)*m) time, where n is the size of the data structure and m is the number of results and should be as fast as possible. Memory consumption is not a big issue right now. I’m writing this in python, so it would be great if you could point me to a suitable data structure (preferably) implemented in c with python wrappers.
Share
You want a trie.
http://en.wikipedia.org/wiki/Trie
I’ve used them in Scrabble and Boggle programs. They’re perfect for the use case you described (fast prefix lookup).
Here’s some sample code for building up a trie in Python. This is from a Boggle program I whipped together a few months ago. The rest is left as an exercise to the reader. But for prefix checking you basically need a method that starts at the root node (the variable
words), follows the letters of the prefix to successive child nodes, and returns True if such a path is found and False otherwise.