Given a 1 dimension set of random numbers, we simply go through the set, and push the data down the tree. In one dimension, this is very simple. We can simply compare the value of the data, and decide where the data will propagate down the tree.
However, for higher dimension, distance starts to become blurry, and it is more difficult to decide which data should go where down the tree.
In fact, if we are to design a hierarchical tree that contains a set of high dimension vectors, (for instance, 128 dimension SIFT features) how can we decide which of each n dimension vector should go to which subtree and so on? What are some of the things we do?
Random trees
A random tree is a common technique for classification or clustering.
Here’s how you decide on how to split each node of the tree:
So, each node will need to store:
Leaves will store: