I have upto 10,000 randomly positioned points in a space and i need to be able to tell which the cursor is closest to at any given time. To add some context, the points are in the form of a vector drawing, so they can be constantly and quickly added and removed by the user and also potentially be unbalanced across the canvas space..
I am therefore trying to find the most efficient data structure for storing and querying these points. I would like to keep this question language agnostic if possible.
After the Update to the Question
Use two Red-Black Tree or Skip_list maps. Both are compact self-balancing data structures giving you O(log n) time for search, insert and delete operations. One map will use X-coordinate for every point as a key and the point itself as a value and the other will use Y-coordinate as a key and the point itself as a value.
As a trade-off I suggest to initially restrict the search area around the cursor by a square. For perfect match the square side should equal to diameter of your “sensitivity circle” around the cursor. I.e. if you’re interested only in a nearest neighbour within 10 pixel radius from the cursor then the square side needs to be 20px. As an alternative, if you’re after nearest neighbour regardless of proximity you might try finding the boundary dynamically by evaluating floor and ceiling relative to cursor.
Then retrieve two subsets of points from the maps that are within the boundaries, merge to include only the points within both sub sets.
Loop through the result, calculate proximity to each point (dx^2+dy^2, avoid square root since you’re not interested in the actual distance, just proximity), find the nearest neighbour.
Take root square from the proximity figure to measure the distance to the nearest neighbour, see if it’s greater than the radius of the “sensitivity circle”, if it is it means there is no points within the circle.
I suggest doing some benchmarks every approach; it’s two easy to go over the top with optimisations. On my modest hardware (Duo Core 2) naïve single-threaded search of a nearest neighbour within 10K points repeated a thousand times takes 350 milliseconds in Java. As long as the overall UI re-action time is under 100 milliseconds it will seem instant to a user, keeping that in mind even naïve search might give you sufficiently fast response.
Generic Solution
The most efficient data structure depends on the algorithm you’re planning to use, time-space trade off and the expected relative distribution of points:
So it’s really impossible to evaluate the data structure is isolation from algorithm which in turn is hard to evaluate without good idea of task constraints and priorities.
Sample Java Implementation