When looking through the docs of Data.Set, I saw that insertion of an element into the tree is mentioned to be O(log(n)). However, I would intuitively expect it to be O(n*log(n)) (or maybe O(n)?), as referential transparency requires creating a full copy of the previous tree in O(n).
I understand that for example (:) can be made O(1) instead of O(n), as here the full list doesn’t have to be copied; the new list can be optimized by the compiler to be the first element plus a pointer to the old list (note that this is a compiler – not a language level – optimization). However, inserting a value into a Data.Set involves rebalancing that looks quite complex to me, to the point where I doubt that there’s something similar to the list optimization. I tried reading the paper that is referenced by the Set docs, but couldn’t answer my question with it.
So: how can inserting an element into a binary tree be O(log(n)) in a (purely) functional language?
There is no need to make a full copy of a
Setin order to insert an element into it. Internally, element are stored in a tree, which means that you only need to create new nodes along the path of the insertion. Untouched nodes can be shared between the pre-insertion and post-insertion version of theSet. And as Deitrich Epp pointed out, in a balanced treeO(log(n))is the length of the path of the insertion. (Sorry for omitting that important fact.)Say your
Treetype looks like this:… and say you have a
Treethat looks like this… where
tlandtr'are some named subtrees. Now say you want to insert12into this tree. Well, that’s going to look something like this:The subtrees
tlandtr'are shared betweentandt', and you only had to construct 3 newNodesto do it, even though the size oftcould be much larger than 3.EDIT: Rebalancing
With respect to rebalancing, think about it like this, and note that I claim no rigor here. Say you have an empty tree. Already balanced! Now say you insert an element. Already balanced! Now say you insert another element. Well, there’s an odd number so you can’t do much there.
Here’s the tricky part. Say you insert another element. This could go two ways: left or right; balanced or unbalanced. In the case that it’s unbalanced, you can clearly perform a rotation of the tree to balance it. In the case that it’s balanced, already balanced!
What’s important to note here is that you’re constantly rebalancing. It’s not like you have a mess of a tree, decided to insert an element, but before you do that, you rebalance, and then leave a mess after you’ve completed the insertion.
Now say you keep inserting elements. The tree’s gonna get unbalanced, but not by much. And when that does happen, first off you’re correcting that immediately, and secondly, the correction occurs along the path of the insertion, which is
O(log(n))in a balanced tree. The rotations in the paper you linked to are touching at most three nodes in the tree to perform a rotation. so you’re doingO(3 * log(n))work when rebalancing. That’s stillO(log(n)).