I am trying to build the KD tree(independent) for image features. I have extracted the image features,the feature contains suppose 1000 float values.
Using map-reduce to distribute the images among the nodes of the cluster according to classification(eg, cat,dog,guns)ie. each node will contain the bunch of the similar images & then build KD tree of the images on each node. I am confused about how the tree can be built.
So how can I build the KD tree using map-reduce? Each node will contain the tree,right? What could be the logic to distribute the images? While building the KD-tree, on what basis should I add image-feature vectors in tree(ie left or right child)?
Any help is appreciated.Thanks in advance.
I don’t think that a k-d-tree is the right thing for your data. Here’s what Wikipedia says about it:
Your feature vectors have dimensionality 1000, which means that you should have around 10^300 images, which is quite unlikely.
I suggest that you look at Locality-sensitive hashing, which is one of the mentioned approximate nearest-neighbor searches for high-dimensional data.
Since Wikipedia is not always the best place to learn something complicated, I suggest you take a look at the respective lecture slides of the Data Mining course of ETH Zurich instead. It just so happens that I am taking this course in the current semester.