This sounds like a simple question, but I don’t know how to search for its answer.
I have a trie implementation in C# that will store about 80K words from a dictionary file. It takes quite a while to load all these words (more than 5 mins). I was wondering, what is the best way to “persist” those data so I don’t have to reload all words every time I start the application?
Thanks.
Like all other performance issues, the ideal solution will follow from profiling your current solution and other candidate solutions that you come up with. Where’s the bottleneck? The I/O? Lexing the text? Forming the links in the trie? Will be hard to make a concrete suggestion without knowing your performance goals, the nature of the trie-usage and bottlenecks currently present.
Issues to consider:
One possible strategy: Create and persist a ‘most common words’ dictionary with the 1,000 (or so) of the most frequently-used words. Load these words into the trie on start-up, and spawn the loading of the full-dictionary on another thread; incrementally adding to the created trie as new words are read.
synchronization, user will see an
incomplete trie until loading is
fully complete. This may or may not be a showstopper depending on what the trie is being used for.