I have two tables, one that roughly looks like:
client_ip server_ip speed
--------- --------- -----
1.2.3.4 9.1.2.3 100
1.2.3.5 9.1.2.3 1033
And another that has geo data:
ip latitude longitude
------- -------- ---------
1.2.3.4 13.75 100.21
1.2.3.5 21.1234 141.21
9.1.2.3 13.75 99.21
I would like to write a select query that figures out the great circle distance between the two IP addresses, groups by it, and calculates the average speed. So, for example, the ideal output would be something like:
distance avg(speed)
-------- ----------
21 99
100 1234
While I know there are good resources out there on getting the great circle distance in SQL, my head is a little cloudy on how to efficiently join the two tables, since both are rather large (millions of rows).
Any advice?
Assuming that the IPs in the geo data table are unique, it is actually not a very expensive join. Even though you are joining from a table with non-unique values in the server_ip column, you don’t have to tell the system that it is an outer join — for each line in the “speed” table, there will be one, and only one, entry in the IP table matching the client_ip, and one, and only one, entry in the IP table matching the server_ip. So, you can use inner joins without any trouble.