I have been playing with this one for a few days now, and keep running into performance walls.
The data:
- 10s to hundreds of thousands of 3D points
- Points are positive/negative ints and fall on a 3D grid with no overlap
- Will rarely add new points
- Will usually be gapless but gaps are possible
The structure:
- Must be able to efficiently find the nearest neighbours along each axis (“closest point to the left”) and only that axis.
- Rarely handles inserts or deletes after construction (but must handle them)
- Does not need to handle overlapping points
I have found a possible solution in http://docs.scipy.org/doc/scipy/reference/spatial.html, however the K-d tree seems to be extremely wasteful for this type of data (suitable more for clusters of arbitrary points) and tuned for finding points within a radius. The primary use case for this data is often finding (and following) the nearest neighbour point along each.
Example Data (x, y, z):
[(4, 3, 0), (4, 4, 0), (5, 3, 0), (3, 3, 0), (4, 3, 1), ...]
Possibly my google-fu is failing me and an optimal structure exists already (preferably in Python), but I have not been able to find one.
How about constructing 3 KD-trees for x,y,z axes respectively ?
You need some kind of tree structure anyway IMO.