I have an ordered list (a dictionary – 100K words) and many words to search on this list frequently. So performance is an issue. I know that a HashSet.contains(theWord) or Collections.binarySearch(sortedList, theWord) are very fast. But I am actually not looking for the whole word.
What I want is let’s say searching for "se" and getting all the words starts with "se". So is there a ready to use solution in Java or any libraries?
A better example: On a sorted list a quick solution for the following operation
List.subList (String beginIndex, String endIndex) // returns the interval
myWordList.subList(“ab”, “bc”);
Note: Here is a very similar question but accepted answer is not satisfying.
Overriding HashSet's Contains Method
What you’re looking for here is a data structure commanly called a ‘trie’:
http://en.wikipedia.org/wiki/Trie
It stores strings in a tree indexed by prefix, where the first level of the tree contains the first character of the string, the second level the second character, etc. The result is that it allows you to extract subsets of very large sets of strings by prefix extremely quickly.