I have an sql table containing the gps coordinates of a device, updated every n minutes (the device is installed in a vehicle). given the nature of GPS, lots of the entries are very similar, but entirely different as far as the server is concerned. I can approximately match things (within ~3.6′ or maybe 36′) easy enough with CAST(lat as decimal(7,4))
I’d like to be able to take a result set and condense the approximate duplicate entries, but still maintain the time-based order. here’s an example:
Row Lat Lng vel Hdg Time
01 31.12345 -88.12345 00 00 12-4-21 01:45:00
02 31.12346 -88.12345 00 00 12-4-21 01:46:00
03 31.12455 -88.12410 10 01 12-4-21 01:47:00
04 31.12495 -88.12480 17 01 12-4-21 01:48:00
05 31.12532 -88.12560 22 01 12-4-21 01:49:00
06 31.12567 -88.12608 25 02 12-4-21 01:50:00
07 31.12638 -88.12672 24 02 12-4-21 01:51:00
08 31.12689 -88.12722 19 02 12-4-21 01:52:00
09 31.12345 -88.12345 00 00 12-4-21 01:53:00
10 31.12346 -88.12346 00 00 12-4-21 01:54:00
11 31.12347 -88.12345 00 00 12-4-21 01:55:00
12 31.12346 -88.12346 00 00 12-4-21 01:56:00
13 31.12689 -88.12788 10 40 12-4-21 01:57:00
14 31.12604 -88.12691 13 39 12-4-21 01:58:00
15 31.12572 -88.12603 15 39 12-4-21 01:59:00
my desired end result would be rows 1 and 2 to be condensed to a single row, and rows 9 through 12 be condensed to a single row, containing AVG(Lat), AVG(Lng), and MIN(Time).
This is the result set i would like to receive, given the above data:
Row Lat Lng vel Hdg Time
01 31.123455 -88.12345 00 00 12-4-21 01:45:00
02 31.12455 -88.12410 10 01 12-4-21 01:47:00
03 31.12495 -88.12480 17 01 12-4-21 01:48:00
04 31.12532 -88.12560 22 01 12-4-21 01:49:00
05 31.12567 -88.12608 25 02 12-4-21 01:50:00
06 31.12638 -88.12672 24 02 12-4-21 01:51:00
07 31.12689 -88.12722 19 02 12-4-21 01:52:00
08 31.12346 -88.123455 00 00 12-4-21 01:53:00
09 31.12689 -88.12788 10 40 12-4-21 01:57:00
10 31.12604 -88.12691 13 39 12-4-21 01:58:00
11 31.12572 -88.12603 15 39 12-4-21 01:59:00
the boundaries between groupings would be movement. velocity being > 0, or gps coordinate changing more than x amount. in this case, x is .0001. the problem, as described below, is that multiple stops (AT DIFFERENT TIMES) at a given coordinate are lumped into a single stop. if i visit coordinate x today at 4 pm, and tomorrow at 8 am, and then again at 6 pm, the only one i see is the tomorrow @ 6 pm (in the case of MAX(Time)) or the today @ 4 pm (in the case of MIN(Time)).
It’s a given that if velocity is 0, heading is also 0. It is, however, important that rows 1 and 2, and 9 through 12 not be grouped TOGETHER if their coordinates are similar enough to be the same (i.e. when rounded to 4 decimal places).
i have a query that does just that:
SELECT Geography::Point(AVG(dbo.GPSEntries.Latitude),
AVG(dbo.GPSEntries.Longitude),
4326 ) as Location,
dbo.GPSEntries.Velocity,
dbo.GPSEntries.Heading,
MAX(dbo.GPSEntries.Time) as maxTime,
MIN(dbo.GPSEntries.Time) as minTime,
AVG(dbo.RFDatas.RSSI) as avgRSSI,
COUNT(1) as samples
FROM dbo.GPSEntries
INNER JOIN
dbo.Reports ON
dbo.GPSEntries.Report_Id = dbo.Reports.Id
INNER JOIN
dbo.RFDatas ON
dbo.GPSEntries.Report_Id = dbo.RFDatas.Report_Id
GROUP BY CAST(Latitude as Decimal(7,4)),
CAST(Longitude as Decimal(7,4)),
Velocity,
Heading
ORDER BY MAX(Time)
in other words, if i travel from point A to point B, stay for 30 minutes (and 30 reports at 1 per minute), then travel to point C, stay for 20 minutes, then travel back to point B and stay for 20 more minutes before heading to point D, i would like to be able to see both separate stops at point B.
Here’s some actual data from my db, sanitized to protect the innocent, or to blame someone in north east alabama.
Latitude Longitude Spd Vel MAX(Time) MIN(Time) sig RowCount
34.747420 -86.302580 68 157 2012-06-13 01:31:37.000 2012-06-13 01:31:37.000 -91 1
34.759140 -86.307620 61 134 2012-06-13 01:33:06.000 2012-06-13 01:33:06.000 -91 2
34.763237 -86.307264 0 0 2012-06-13 01:34:36.000 2012-06-12 01:27:21.000 -97 7
34.763288 -86.307280 0 0 2012-06-13 14:30:44.000 2012-06-12 01:30:21.000 -98 527
34.760220 -86.308200 38 110 2012-06-13 14:33:44.000 2012-06-13 14:33:44.000 -98 1
34.750350 -86.305750 5 90 2012-06-13 14:35:13.000 2012-06-13 14:35:13.000 -83 2
34.737160 -86.298040 70 88 2012-06-13 14:36:43.000 2012-06-13 14:36:43.000 -80 1
34.736420 -86.277270 120 33 2012-06-13 14:38:13.000 2012-06-13 14:38:13.000 -87 2
34.747090 -86.248370 120 37 2012-06-13 14:39:43.000 2012-06-13 14:39:43.000 -93 2
34.755620 -86.240640 70 179 2012-06-13 14:41:13.000 2012-06-13 14:41:13.000 -81 1
34.771240 -86.242760 70 0 2012-06-13 14:42:42.000 2012-06-13 14:42:42.000 -88 2
34.785510 -86.245710 70 6 2012-06-13 14:44:12.000 2012-06-13 14:44:12.000 -99 2
34.800220 -86.239400 70 1 2012-06-13 14:45:42.000 2012-06-13 14:45:42.000 -86 1
34.815070 -86.232180 70 16 2012-06-13 14:47:12.000 2012-06-13 14:47:12.000 -98 2
34.824540 -86.226198 0 0 2012-06-13 14:51:41.000 2012-06-13 00:13:48.000 -101 9
34.824579 -86.226171 0 0 2012-06-14 00:26:19.000 2012-06-12 00:46:57.000 -99 168
You’ll note the 4th and last row have 527 and 168 entries, respectively, and they span 2 days. those entries are from 1 device only, and are from where the device was stopped for several hours in the same place on multiple occasions.
Here’s some zipped csv data: sample
What I Finally Done Did
Some minor modifications to Aaron Bertrand’s supplied query shown below:
WITH d AS
(
SELECT Time
,Latitude
,Longitude
,Velocity
,Heading
,TimeRN = ROW_NUMBER() OVER (ORDER BY [Time])
FROM dbo.GPSEntries
GROUP BY Time, Latitude, Longitude, Velocity, Heading
),
y AS (
SELECT BeginTime = MIN(Time)
,EndTime = MAX(Time)
,Latitude = AVG(Latitude)
,Longitude = AVG(Longitude)
-- ,[RowCount] = COUNT(*)
,GroupNumber
FROM (
SELECT Time
,Latitude
,Longitude
,GroupNumber = (
SELECT MIN(d2.TimeRN)
FROM d AS d2
WHERE d2.TimeRN >= d.TimeRN AND
NOT EXISTS (
SELECT 1
FROM d AS d3 -- Between 250 and 337 feet
WHERE ABS(d2.Latitude - d.Latitude) <= .0007 AND
ABS(d2.Longitude - d.Longitude) <= .0007 AND
d2.Velocity = d.Velocity ) )
FROM d ) AS x
GROUP BY GroupNumber
)
SELECT y.Latitude
,y.Longitude
,d.Velocity
,d.Heading
,y.BeginTime
-- ,y.EndTime
-- ,y.[RowCount]
-- ,Duration = CONVERT(time(0),DATEADD(SS,DATEDIFF(SS,y.BeginTime, y.EndTime), '0:00:00'), 108)
FROM y INNER JOIN d ON y.BeginTime = d.[Time]
-- FOR STOPS (5 minute):
-- WHERE DATEDIFF(MI, Y.BeginTime, y.EndTime) + 1 > 5
ORDER BY y.BeginTime;
Here is some sample data in tempdb:
And my attempt at satisfying the query:
Results: