I have developed a GPS application where all the devices (moving on the road) send their coordinates to the server in every 30 seconds. Now I have to calculate the distance between these devices so if any device comes in the range of another device then both the devices get a notification.
I know how to calculate the distance between two coordinates (thanks to Google) but I am not sure how to implement it; if we have 1 million devices simultaneously sending data to the server then the server needs to execute distance calculation 1 million * (1 million – 1) times every 30 seconds.
Please let me how to implement it. Do I need to use anything like Hadoop or a MySQL database procedure to do the job? Calculation is not a problem here but handling and calculating this much data is a problem.
There’s a data structure called a QuadTree. Keep the data points updated in the quad tree and you will have a much much smaller data set to compare the values against.
As clients log in and move, and send you datapoints, you change their location in the quad tree. Now the QuadTree is going to have a 2d map of all your datapoints, split into buckets. Each bucket contains 4 other buckets that may or may not have points in them. When you’re trying to find everyone within X of a given data point, you look at all the points in the bucket that point is in. Then you look at all the points in the buckets ‘around’ that bucket. (There’s 8 of them. N S E W NW SW NE SE.) You keep going until the distance to the buckets (and therefore all the points in them) is greater than your minimum range.
Now everyone else, most of whom are probably very far away, don’t ever need to be tested. You never see their buckets.