I’ve coded up a KD-Tree in Java using the “median of list” algorithm for constructing a more balanced tree. It seems to work fine when using the data provided by the wiki, note that the wikipedia example uses only X,Y values, so it doesn’t evaluate the Z depth.
From wikipedia:
point_list = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]

From my java program:
depth=0 id=(7.0, 2.0, 0.0)
├── [left] depth=1 id=(5.0, 4.0, 0.0)
│ ├── [left] depth=2 id=(2.0, 3.0, 0.0)
│ └── [right] depth=2 id=(4.0, 7.0, 0.0)
└── [right] depth=1 id=(9.0, 6.0, 0.0)
└── [left] depth=2 id=(8.0, 1.0, 0.0)
But when I use the “median of list” approach on this data, it doesn’t seem to work properly.
point list = [(1,0,-1), (1,0,-2), (1,0,1), (1,0,2)]
I get a tree like this:
depth=0 id=(1.0, 0.0, 1.0)
├── [left] depth=1 id=(1.0, 0.0, -2.0)
│ └── [left] depth=2 id=(1.0, 0.0, -1.0)
└── [right] depth=1 id=(1.0, 0.0, 2.0)
Which doesn’t look correct because (1.0, 0.0, 2.0) is to the right of (1.0, 0.0, 1.0) but they are essentially equal because their Y values are equal. Also, (1.0, 0.0, -1.0) is to the left of (1.0, 0.0, -2.0) and it should be to the right since it’s Z value is greater.
I think the problem stems from having equal X and Y values and only variable Z values, so the median of the list doesn’t really split the list accurately.
… original code following the wiki’s python code …
private static KdNode createNode(List<XYZPoint> list, int k, int depth) {
if (list == null || list.size() == 0) return null;
int axis = depth % k;
if (axis == X_AXIS) Collections.sort(list, X_COMPARATOR);
else if (axis == Y_AXIS) Collections.sort(list, Y_COMPARATOR);
else Collections.sort(list, Z_COMPARATOR);
KdNode node = null;
if (list.size() > 0) {
int mediaIndex = list.size() / 2;
node = new KdNode(k, depth, list.get(mediaIndex));
if ((mediaIndex - 1) >= 0) {
List<XYZPoint> less = list.subList(0, mediaIndex);
if (less.size() > 0) {
node.lesser = createNode(less, k, depth + 1);
node.lesser.parent = node;
}
}
if ((mediaIndex + 1) <= (list.size() - 1)) {
List<XYZPoint> more = list.subList(mediaIndex + 1, list.size());
if (more.size() > 0) {
node.greater = createNode(more, k, depth + 1);
node.greater.parent = node;
}
}
}
return node;
}
… new code based on my comment …
private static KdNode createNode(List<XYZPoint> list, int k, int depth) {
if (list == null || list.size() == 0) return null;
int axis = depth % k;
if (axis == X_AXIS) Collections.sort(list, X_COMPARATOR);
else if (axis == Y_AXIS) Collections.sort(list, Y_COMPARATOR);
else Collections.sort(list, Z_COMPARATOR);
KdNode node = null;
if (list.size() > 0) {
int medianIndex = list.size() / 2;
node = new KdNode(k, depth, list.get(medianIndex));
List<XYZPoint> less = new ArrayList<XYZPoint>(list.size()-1);
List<XYZPoint> more = new ArrayList<XYZPoint>(list.size()-1);
//Process list to see where each non-median point lies
for (int i=0; i<list.size(); i++) {
if (i==medianIndex) continue;
XYZPoint p = list.get(i);
if (KdNode.compareTo(depth, k, p, node.id)<=0) {
less.add(p);
} else {
more.add(p);
}
}
if (less.size() > 0) {
node.lesser = createNode(less, k, depth + 1);
node.lesser.parent = node;
}
if (more.size() > 0) {
node.greater = createNode(more, k, depth + 1);
node.greater.parent = node;
}
}
The problem indeed has to do with equal coordinates and arises from how you split the nodes into
lessandmoreparts. Since you have the median index, why not use the index for splitting instead of checking the coordinates? Just change the condition increateNodeon line 116 fromto
Btw: there are more efficient algorithms to partition a list into lower, median, upper than sorting. (lower and upper parts do not need to be sorted! see e.g. the implementation of
std::nth_elementin the C++ stdlib – sorry, I’m that much into Java programming)