how do I select a subset of points at a regular density? More formally,
Given
- a set A of irregularly spaced points,
- a metric of distance
dist(e.g., Euclidean distance), - and a target density d,
how can I select a smallest subset B that satisfies below?
- for every point x in A,
- there exists a point y in B
- which satisfies
dist(x,y) <= d
My current best shot is to
- start with A itself
- pick out the closest (or just particularly close) couple of points
- randomly exclude one of them
- repeat as long as the condition holds
and repeat the whole procedure for best luck. But are there better ways?
I’m trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don’t really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.
bottom-up method
- select a random point
- select among unselected
yfor whichmin(d(x,y) for x in selected)is largest - keep going!
I’ll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?
performance measure
If guarantee of optimality is not required, I think these two indicators could be useful:
- radius of coverage:
max {y in unselected} min(d(x,y) for x in selected) - radius of economy:
min {y in selected != x} min(d(x,y) for x in selected)
RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE is more desirable.
my little methods
For a little demonstration of that “performance measure,” I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:


RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it’s much faster.
Here I packed everything.
http://www.filehosting.org/file/details/352267/density_sampling.tar.gz
A genetic algorithm may probably produce good results here.
update:
I have been playing a little with this problem and these are my findings:
A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:
A kd-tree can be used to perform the look ups in step 3 relatively fast.
The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.
Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.
For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.
For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.
I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.
The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl
It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.