given a large list of alphabetically sorted words in a file,I need to write a program that, given a word x, determines if x is in the list. Preprocessing is ok since I will be calling this function many times over different inputs.
priorties: 1. speed. 2. memory
I already know I can use (n is number of words, m is average length of the words)
1. a trie, time is O(log(n)), space(best case) is O(log(nm)), space(worst case) is O(nm).
2. load the complete list into memory, then binary search, time is O(log(n)), space is O(n*m)
I’m not sure about the complexity on tri, please correct me if they are wrong. Also are there other good approaches?
It is O(m) time for the trie, and up to O(mlog(n)) for the binary search. The space is asymptotically O(nm) for any reasonable method, which you can probably reduce in some cases using compression. The trie structure is, in theory, somewhat better on memory, but in practice it has devils hiding in the implementation details: memory needed to store pointers and potentially bad cache access.
There are other options for implementing a set structure – hashset and treeset are easy choices in most languages. I’d go for the hash set as it is efficient and simple.