This is an interview question. Given a number of strings find such strings, which are prefixes of others. For example, given strings = {"a", "aa", "ab", abb"} the result is {"a", "ab"}.
The simplest solution is just to sort the strings and check each pair of two subsequent strings if the 1st one is a prefix of the 2nd one. The running time of the algorithm is the running time of the sorting.
I guess there is another solution, which uses a trie, and has complexity O(N), where N is the number of strings. Could you suggest such an algorithm?
I have a following idea regarding Trie, complexity O(N):
You start with empty Trie.
You take words one by one, and add word to Trie.
After you add a word (let’s call it word Wi) to Trie, there are two cases to consider:
That statement is true if you didn’t add any nodes to Trie while adding word Wi.
In that case, Wi is prefix and part of our solution.
That statement is true if you passed through node that represents end of some word added before (let’s cal that word Wj). In that case, Wj is prefix of Wi and part of our solution.
In more details (pseudocode):
Adding new word to Trie:
While you are adding new word, you can also check if you passed through any nodes that represent last letters of other words.
Complexity of algorithm that I described is O(N).
Another important thing is that this way you can know how many times word Wi prefixes other words, which may be useful.
Example for {aab, aaba, aa}:
Green nodes are nodes detected as case 1.
Red nodes are nodes detected as case 2.
Each column(trie) is one step. At the beginning trie is empty.
Black arrows show which nodes we visited(added) in that step.
Nodes that represent last letter of some word have that word written in parenthesess.
At the end we have result = {aab, aa} which is correct.