I have a large (> million elements) tree, and each element has an ‘offset’ field that refers to something external. I need to do both:
- Insert new elements at arbitrary positions. Each insertion will cause the ‘offset’ field of later elements to be increased by some amount.
- Quickly obtain the offset value of an element.
If 2 wasn’t a requirement, I’d store offsets relative to the previous one, then there’d be no need to update everything after an insertion. But that would mean I’d need to add up every previous offset to calculate one element’s absolute value.
Is there a canonical way of doing this sort of thing? I was thinking maybe a compromise, where eg every nth element would have an absolute offset, and the other elements’ offsets would be relative to the previous absolute one, meaning I’d have to do a small amount of traversal in both cases.
There a few approaches that are somewhat based on your idea of having some elements store an absolute offset.
One of them (I think it’s a version of tiered vector) is to store the change in offset for the first consecutive
sqrt(N)elements, then for the elements fromsqrt(N)to2 * sqrt(N)and so on. Then, in order to find the offset for a given element, you need to sum all the consecutive sums of previous elements (which are at mostsqrt(N) + 1, since(sqrt(N) ^ 2) = N) and then add the elements that are after the last whole group, but before the element you’re interested in. This gives you anO(sqrt(N))insert and lookup time.You can also take this approach to the next level, and store the sums for:
This way, you get a data structure that is similar to an interval or segment tree, but not exactly the same. It can be implemented as a complete binary tree using a simple array. It gives you a complexity of
O(log N)for both operations.A few improvements to this idea lead to Binary Indexed Trees, which have the same complexity, but use about half the space.