I’ve implemented my own AVL tree and I’m using it as a dictionary. I’m wondering, what would be the fastest way to count all the words that starts with some string.
eg:
string prefix = "fa";

output: 4
I’ve got it working in O(n) however, I’ve heard it can be done much faster.
I can of course hold in nodes additional information, like nodes that are below and other things like that.
If you want to reduce the memory footprint as much as possible while keeping the same asymptotic time bounds, you can suffice with one integer per node and still achieve
O(log n)time (assuming constant-time key comparison).Store with each node the size of its subtree. This can be easily updated during tree modifications.
To find the number of keys with a given range:
The range for a given prefix contains all elements that have the prefix. It is important to note that the set of strings with a given prefix is consecutive w.r.t. its sorting order – that is, it’s indeed a range.
The start of a prefix range is the position just before the prefix itself.
The end of a prefix range is the position just before the lexicographically first disjoint prefix after this one (
FA=>FB;FZ=>GAwhen onlyA-Zare in the alphabet).Unicode simplifies this by introducing a ‘top’ character that may not actually occur in a text, and compares above all other characters. That is,
end = prefix + "\uFFFF".