I want to use nltk.containers.Trie to perform simple operations like inserting a word into the trie, retrieving all words with a given prefix, find nodes with most descendants (i.e. most common prefixes), graphically viewing the trie and so on. I couldn’t find any documentation whatsoever regarding the use of this structure. Here’s all I have so far:
from nltk.containers import Trie
t = Trie()
I now have a list of words which I need to add to the trie.
It’s pretty cryptic, isn’t it. It’s basically a dictionary but you can additionally check if a string is a prefix of a known key:
There’s also
find_prefix, which will match as much of its argument as possible, and return the value it finds there (if any) plus the remainder of the argument:You could take a look at the source in
nltk/containers.py. The magic is in the methods__setitem__and__getitem__, which handle expressions of the formt[key].Also good to know: The
keys()method will only return real entries, not prefixes. You can use it with the methodsubtrieto retrieve all words that begin with a given prefix:PS. Note that
containers.pywas removed from the NLTK about six months ago! Before you update your nltk distribution (which you should), savenltk/containers.pyunder a different name. Better yet, just save theTrieclass. (The rest of the file is obsolete).