I have a “geo_locations” table that looks like this:
country | city | postalCode | latitude | longitude | metroCode | areaCode
-------------------------------------------------------------------------
US | Chadler | 85226 | 33.2769 | -111.9444 | 753 | 480
more records...
And a “users” table that looks like this:
user_id | sex | dob | country | region | city | zip | latitude | longitude | email | username | password
--------------------------------------------------------------------------------------------------------------------------------------
1 | m | 1987-05-14 | US | NY | Flushing | 11398 | 40.7723 | -73.8722 | foo@bar.com | HiBye99 | 54524sAS%ASa2?&^312
more records...
My application requires that I recommend users to other users, for meetups, making friends, etc. I must recomend a user to other users who are within
their area. It doesn’t make sense to recomend someone from Bejing to someone in New York for example.
To achieve this I am using the following sql:
SELECT postalCode, latitude, longitude, ACOS(SIN($lat) * SIN(RADIANS(latitude)) + COS($lat) * COS(RADIANS(latitude)) * COS(RADIANS(longitude) - $lon)) * $radius AS D
FROM (
SELECT postalCode, latitude, longitude
FROM geo_locations
WHERE latitude > $min_lat AND latitude < $max_lat AND longitude > $min_lon AND longitude < $max_lon
) AS FirstCut
WHERE ACOS(SIN($lat) * SIN(RADIANS(latitude)) + COS($lat) * COS(RADIANS(latitude)) * COS(RADIANS(longitude) - $lon)) * $radius < $rad
ORDER BY D
Before that sql, some fancy calculations are done on the $lat, $lon, $rad, $radius, $min_lat, $max_lat, $min_lon and $max_lon variables. Full code can be
seen here where I grabbed the code from:
http://www.movable-type.co.uk/scripts/latlong-db.html
Anyway what this sql returns in my code is a collection of all the “postal codes” that are close to the users area. So I then use those zip codes to build out another sql query which
usually looks crazy like this (SO wouldn’t let me submit the question because it was so huge):
And after running it on the users table I get my list of recommended users.
Question:
Now as you can see I have the latitude and longitude columns on the users table as well. So there’s really no need to select anything from the “geo_locations” table.
How can I alter my query so I can get all my recommended users directly from the “users” table in one query?
Also is my way of doing this whole thing a performance nightmare, is there a better way to go about the whole thing?
To answer your first question, you should be able to write the query as:
i.e. you can effectively ignore the geo_locations table and just select all of the same columns from users.
As for the second part, to be perfectly honest, the best way to decide this is to do some testing. Populate the users table with a large amount of information and measure how long the query takes. Then duplicate the number of records and retest. That way you can see the impact more data has on your query.
Alternatives would include just selecting the raw data out from the database and doing the calculations in code. Again, you’d have to test to see the performance benefits/negatives.