Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 671411
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T00:22:14+00:00 2026-05-14T00:22:14+00:00

A point in 3-d is defined by (x,y,z). Distance d between any two points

  • 0

A point in 3-d is defined by (x,y,z). Distance d between any two points (X,Y,Z) and (x,y,z) is d= Sqrt[(X-x)^2 + (Y-y)^2 + (Z-z)^2].
Now there are a million entries in a file, each entry is some point in space, in no specific order. Given any point (a,b,c) find the nearest 10 points to it. How would you store the million points and how would you retrieve those 10 points from that data structure.

  • 1 1 Answer
  • 4 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T00:22:15+00:00Added an answer on May 14, 2026 at 12:22 am

    Million points is a small number. The most straightforward approach works here (code based on KDTree is slower (for querying only one point)).

    Brute-force approach (time ~1 second)

    #!/usr/bin/env python
    import numpy
    
    NDIM = 3 # number of dimensions
    
    # read points into array
    a = numpy.fromfile('million_3D_points.txt', sep=' ')
    a.shape = a.size / NDIM, NDIM
    
    point = numpy.random.uniform(0, 100, NDIM) # choose random point
    print 'point:', point
    d = ((a-point)**2).sum(axis=1)  # compute distances
    ndx = d.argsort() # indirect sort 
    
    # print 10 nearest points to the chosen one
    import pprint
    pprint.pprint(zip(a[ndx[:10]], d[ndx[:10]]))
    

    Run it:

    $ time python nearest.py 
    point: [ 69.06310224   2.23409409  50.41979143]
    [(array([ 69.,   2.,  50.]), 0.23500677815852947),
     (array([ 69.,   2.,  51.]), 0.39542392750839772),
     (array([ 69.,   3.,  50.]), 0.76681859086988302),
     (array([ 69.,   3.,  50.]), 0.76681859086988302),
     (array([ 69.,   3.,  51.]), 0.9272357402197513),
     (array([ 70.,   2.,  50.]), 1.1088022980015722),
     (array([ 70.,   2.,  51.]), 1.2692194473514404),
     (array([ 70.,   2.,  51.]), 1.2692194473514404),
     (array([ 70.,   3.,  51.]), 1.801031260062794),
     (array([ 69.,   1.,  51.]), 1.8636121147970444)]
    
    real    0m1.122s
    user    0m1.010s
    sys 0m0.120s
    

    Here’s the script that generates million 3D points:

    #!/usr/bin/env python
    import random
    for _ in xrange(10**6):
        print ' '.join(str(random.randrange(100)) for _ in range(3))
    

    Output:

    $ head million_3D_points.txt
    
    18 56 26
    19 35 74
    47 43 71
    82 63 28
    43 82 0
    34 40 16
    75 85 69
    88 58 3
    0 63 90
    81 78 98
    

    You could use that code to test more complex data structures and algorithms (for example, whether they actually consume less memory or faster then the above simplest approach). It is worth noting that at the moment it is the only answer that contains working code.

    Solution based on KDTree (time ~1.4 seconds)

    #!/usr/bin/env python
    import numpy
    
    NDIM = 3 # number of dimensions
    
    # read points into array
    a = numpy.fromfile('million_3D_points.txt', sep=' ')
    a.shape = a.size / NDIM, NDIM
    
    point =  [ 69.06310224,   2.23409409,  50.41979143] # use the same point as above
    print 'point:', point
    
    
    from scipy.spatial import KDTree
    
    # find 10 nearest points
    tree = KDTree(a, leafsize=a.shape[0]+1)
    distances, ndx = tree.query([point], k=10)
    
    # print 10 nearest points to the chosen one
    print a[ndx]
    

    Run it:

    $ time python nearest_kdtree.py  
    
    point: [69.063102240000006, 2.2340940900000001, 50.419791429999997]
    [[[ 69.   2.  50.]
      [ 69.   2.  51.]
      [ 69.   3.  50.]
      [ 69.   3.  50.]
      [ 69.   3.  51.]
      [ 70.   2.  50.]
      [ 70.   2.  51.]
      [ 70.   2.  51.]
      [ 70.   3.  51.]
      [ 69.   1.  51.]]]
    
    real    0m1.359s
    user    0m1.280s
    sys 0m0.080s
    

    Partial sort in C++ (time ~1.1 seconds)

    // $ g++ nearest.cc && (time ./a.out < million_3D_points.txt )
    #include <algorithm>
    #include <iostream>
    #include <vector>
    
    #include <boost/lambda/lambda.hpp>  // _1
    #include <boost/lambda/bind.hpp>    // bind()
    #include <boost/tuple/tuple_io.hpp>
    
    namespace {
      typedef double coord_t;
      typedef boost::tuple<coord_t,coord_t,coord_t> point_t;
    
      coord_t distance_sq(const point_t& a, const point_t& b) { // or boost::geometry::distance
        coord_t x = a.get<0>() - b.get<0>();
        coord_t y = a.get<1>() - b.get<1>();
        coord_t z = a.get<2>() - b.get<2>();
        return x*x + y*y + z*z;
      }
    }
    
    int main() {
      using namespace std;
      using namespace boost::lambda; // _1, _2, bind()
    
      // read array from stdin
      vector<point_t> points;
      cin.exceptions(ios::badbit); // throw exception on bad input
      while(cin) {
        coord_t x,y,z;
        cin >> x >> y >> z;    
        points.push_back(boost::make_tuple(x,y,z));
      }
    
      // use point value from previous examples
      point_t point(69.06310224, 2.23409409, 50.41979143);
      cout << "point: " << point << endl;  // 1.14s
    
      // find 10 nearest points using partial_sort() 
      // Complexity: O(N)*log(m) comparisons (O(N)*log(N) worst case for the implementation)
      const size_t m = 10;
      partial_sort(points.begin(), points.begin() + m, points.end(), 
                   bind(less<coord_t>(), // compare by distance to the point
                        bind(distance_sq, _1, point), 
                        bind(distance_sq, _2, point)));
      for_each(points.begin(), points.begin() + m, cout << _1 << "\n"); // 1.16s
    }
    

    Run it:

    g++ -O3 nearest.cc && (time ./a.out < million_3D_points.txt )
    point: (69.0631 2.23409 50.4198)
    (69 2 50)
    (69 2 51)
    (69 3 50)
    (69 3 50)
    (69 3 51)
    (70 2 50)
    (70 2 51)
    (70 2 51)
    (70 3 51)
    (69 1 51)
    
    real    0m1.152s
    user    0m1.140s
    sys 0m0.010s
    

    Priority Queue in C++ (time ~1.2 seconds)

    #include <algorithm>           // make_heap
    #include <functional>          // binary_function<>
    #include <iostream>
    
    #include <boost/range.hpp>     // boost::begin(), boost::end()
    #include <boost/tr1/tuple.hpp> // get<>, tuple<>, cout <<
    
    namespace {
      typedef double coord_t;
      typedef std::tr1::tuple<coord_t,coord_t,coord_t> point_t;
    
      // calculate distance (squared) between points `a` & `b`
      coord_t distance_sq(const point_t& a, const point_t& b) { 
        // boost::geometry::distance() squared
        using std::tr1::get;
        coord_t x = get<0>(a) - get<0>(b);
        coord_t y = get<1>(a) - get<1>(b);
        coord_t z = get<2>(a) - get<2>(b);
        return x*x + y*y + z*z;
      }
    
      // read from input stream `in` to the point `point_out`
      std::istream& getpoint(std::istream& in, point_t& point_out) {    
        using std::tr1::get;
        return (in >> get<0>(point_out) >> get<1>(point_out) >> get<2>(point_out));
      }
    
      // Adaptable binary predicate that defines whether the first
      // argument is nearer than the second one to given reference point
      template<class T>
      class less_distance : public std::binary_function<T, T, bool> {
        const T& point;
      public:
        less_distance(const T& reference_point) : point(reference_point) {}
    
        bool operator () (const T& a, const T& b) const {
          return distance_sq(a, point) < distance_sq(b, point);
        } 
      };
    }
    
    int main() {
      using namespace std;
    
      // use point value from previous examples
      point_t point(69.06310224, 2.23409409, 50.41979143);
      cout << "point: " << point << endl;
    
      const size_t nneighbours = 10; // number of nearest neighbours to find
      point_t points[nneighbours+1];
    
      // populate `points`
      for (size_t i = 0; getpoint(cin, points[i]) && i < nneighbours; ++i)
        ;
    
      less_distance<point_t> less_distance_point(point);
      make_heap  (boost::begin(points), boost::end(points), less_distance_point);
    
      // Complexity: O(N*log(m))
      while(getpoint(cin, points[nneighbours])) {
        // add points[-1] to the heap; O(log(m))
        push_heap(boost::begin(points), boost::end(points), less_distance_point); 
        // remove (move to last position) the most distant from the
        // `point` point; O(log(m))
        pop_heap (boost::begin(points), boost::end(points), less_distance_point);
      }
    
      // print results
      push_heap  (boost::begin(points), boost::end(points), less_distance_point);
      //   O(m*log(m))
      sort_heap  (boost::begin(points), boost::end(points), less_distance_point);
      for (size_t i = 0; i < nneighbours; ++i) {
        cout << points[i] << ' ' << distance_sq(points[i], point) << '\n';  
      }
    }
    

    Run it:

    $ g++ -O3 nearest.cc && (time ./a.out < million_3D_points.txt )
    
    point: (69.0631 2.23409 50.4198)
    (69 2 50) 0.235007
    (69 2 51) 0.395424
    (69 3 50) 0.766819
    (69 3 50) 0.766819
    (69 3 51) 0.927236
    (70 2 50) 1.1088
    (70 2 51) 1.26922
    (70 2 51) 1.26922
    (70 3 51) 1.80103
    (69 1 51) 1.86361
    
    real    0m1.174s
    user    0m1.180s
    sys 0m0.000s
    

    Linear Search -based approach (time ~1.15 seconds)

    // $ g++ -O3 nearest.cc && (time ./a.out < million_3D_points.txt )
    #include <algorithm>           // sort
    #include <functional>          // binary_function<>
    #include <iostream>
    
    #include <boost/foreach.hpp>
    #include <boost/range.hpp>     // begin(), end()
    #include <boost/tr1/tuple.hpp> // get<>, tuple<>, cout <<
    
    #define foreach BOOST_FOREACH
    
    namespace {
      typedef double coord_t;
      typedef std::tr1::tuple<coord_t,coord_t,coord_t> point_t;
    
      // calculate distance (squared) between points `a` & `b`
      coord_t distance_sq(const point_t& a, const point_t& b);
    
      // read from input stream `in` to the point `point_out`
      std::istream& getpoint(std::istream& in, point_t& point_out);    
    
      // Adaptable binary predicate that defines whether the first
      // argument is nearer than the second one to given reference point
      class less_distance : public std::binary_function<point_t, point_t, bool> {
        const point_t& point;
      public:
        explicit less_distance(const point_t& reference_point) 
            : point(reference_point) {}
        bool operator () (const point_t& a, const point_t& b) const {
          return distance_sq(a, point) < distance_sq(b, point);
        } 
      };
    }
    
    int main() {
      using namespace std;
    
      // use point value from previous examples
      point_t point(69.06310224, 2.23409409, 50.41979143);
      cout << "point: " << point << endl;
      less_distance nearer(point);
    
      const size_t nneighbours = 10; // number of nearest neighbours to find
      point_t points[nneighbours];
    
      // populate `points`
      foreach (point_t& p, points)
        if (! getpoint(cin, p))
          break;
    
      // Complexity: O(N*m)
      point_t current_point;
      while(cin) {
        getpoint(cin, current_point); //NOTE: `cin` fails after the last
                                      //point, so one can't lift it up to
                                      //the while condition
    
        // move to the last position the most distant from the
        // `point` point; O(m)
        foreach (point_t& p, points)
          if (nearer(current_point, p)) 
            // found point that is nearer to the `point` 
    
            //NOTE: could use insert (on sorted sequence) & break instead
            //of swap but in that case it might be better to use
            //heap-based algorithm altogether
            std::swap(current_point, p);
      }
    
      // print results;  O(m*log(m))
      sort(boost::begin(points), boost::end(points), nearer);
      foreach (point_t p, points)
        cout << p << ' ' << distance_sq(p, point) << '\n';  
    }
    
    namespace {
      coord_t distance_sq(const point_t& a, const point_t& b) { 
        // boost::geometry::distance() squared
        using std::tr1::get;
        coord_t x = get<0>(a) - get<0>(b);
        coord_t y = get<1>(a) - get<1>(b);
        coord_t z = get<2>(a) - get<2>(b);
        return x*x + y*y + z*z;
      }
    
      std::istream& getpoint(std::istream& in, point_t& point_out) {    
        using std::tr1::get;
        return (in >> get<0>(point_out) >> get<1>(point_out) >> get<2>(point_out));
      }
    }
    

    Measurements shows that most of the time is spent reading array from the file, actual computations take on order of magnitude less time.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm calculating the Euclidean distance between n-dimensional points using OpenCL. I get two lists
Why do I get a CS5001: does not have an entry point defined error
I have two models, Category and Point . The associations are defined as: Category
I need a function to find the shortest distance between two line segments. A
I need a basic function to find the shortest distance between a point and
I need to calculate the distance between two addresses and I don't need a
My code relies heavily on computing distances between two points in 3D space. To
A line is defined by two end points P1[x1, y1], P2[x2, y2]. Let Q
Basically there are three classes I defined: paralelogram , point and line classes, and
I cannot find a consistent method for finding the signed distance between a point

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.