I have a code where in i compare a large data, say a source of a web page against some words in a file. What is the best algorithm to be used?
There can be 2 scenarios:
-
If I have a large amount of words to compare against the source, In which case, for a normal string search algorithm, it would have to take a word, compare against the data, take the next and compare against the data and so on until all is complete.
-
I have only a couple of words in the file and the normal string search would be ok, but still need to reduce the time as much as possible.
What algorithm is best? I know about Boyer-Moore and also Rabin-Karp search algorithms.
Although Boyer-Moore search seems fast, I would also like names of other algorithms and their comparisons.
In both cases, I think you probably want to construct a patricia trie (also called radix tree). Most importantly, lookup time would be O(k), where k is the max length of a string in the trie.