I have two tables, say A and B that contain city information with two columns: latitude and longitude. A contains 100,000 records and B contains 1,000,000 records. My objective is to find the rows of B that are within 1 kilometre from A (for each row in A). How do I go about doing this efficiently? I am targeting a time of less than 30 minutes.
The following query takes forever (which I believe is the result of the cross-product of 100,000 * 1,000,000 = 100 billion row comparisons!):
select *
from A
inner join B
on is_nearby(A.latitude, A.longitude, B.latitude, B.longitude)
is_nearby() is just a simple function that finds the difference between the latitudes and longitudes.
I did a test for one row of A, it takes about 5 seconds per row. By my calculation, it is going to take several weeks for the query to finish execution, which is not acceptable.
Yes, PostGIS will make things faster, since it (a) knows how to convert degrees of latitude and longitude to kilometres (I’ll use the geography type below), and (b) supports a GiST index, which is optimal for GIS.
Assuming you have PostGIS version 2 available on your system, upgrade your datbase and tables:
Now to find the rows of B that are within 1 kilometre from A (for each row in A):