I have a table where I’m storing Lat/Long coordinates, and I want to make a query where I want to get all the records that are within a distance of a certain point.
This table has about 10 million records, and there’s an index over the Lat/Long fields
This does not need to be precise. Among other things, I’m considering that 1 degree Long == 1 degree Lat, which I know is not true, but the ellipse I’m getting is good enough for this purpose.
For my examples below, let’s say the point in question is [40, 140], and my radius, in degrees, is 2 degrees.
I’ve tried this 2 ways:
1) I created a UDF to calculate the Square of the Distance between 2 points, and I’m running that UDF in a query.
SELECT Lat, Long FROM Table WHERE (Lat BETWEEN 38 AND 42) AND (Long BETWEEN 138 AND 142) AND dbo.SquareDistance(Lat, Long, 40, 140) < 4
I’m filtering by a square first, to speed up the query and let SQL use the index, and then refining that to match only the records that fall within the circle with my UDF.
2) Run the query to get the square (same as before, but without the last line), feed ALL those records to my ASP.Net code, and calculate the circle in the ASP.Net side (same idea, calculate the square of the distance to save the Sqrt call, and compare to the square of my radius).
To my suprise, calculating the circle in the .Net side is about 10 times faster than using the UDF, which leads me to believe that I’m doing something horribly wrong with that UDF…
This is the code I’m using:
CREATE FUNCTION [dbo].[SquareDistance] (@Lat1 float, @Long1 float, @Lat2 float, @Long2 float) RETURNS float AS BEGIN -- Declare the return variable here DECLARE @Result float DECLARE @LatDiff float, @LongDiff float SELECT @LatDiff = @Lat1 - @Lat2 SELECT @LongDiff = @Long1 - @Long2 SELECT @Result = (@LatDiff * @LatDiff) + (@LongDiff * @LongDiff) -- Return the result of the function RETURN @Result END
Am I missing something here?
Shouldn’t using a UDF within SQL Server be much faster than feeding about 25% more records than necessary to .Net, with the overhead of the DataReader, the communication between processes and whatnot?
Is there something I’m doing horribly wrong in that UDF that makes it run slow?
Is there any way to improve it?
Thank you very much!
You can improve the performance of this UDF by NOT declaring variables and doing your calculations more in-line. This will likely improve performance a little but (but probably not much).
Even better would be to remove the function and put the calculations in the original query.
There is a little bit of overhead with calling a user defined function. By removing the function, you are likely to gain a little in performance.
Also, I encourage you to check your execution plan just to make sure you are getting index seeks like you expect.