I’m working on asset tracking application. I have devices that send update on GPS position every ~5 minutes.
Now I need to create report that shows me when asset started to move, when stopped and for how long, etc. Basically, I need to GROUP this data.
Problem I have is that GPS data not accurate. If device laying on a same spot – it will send different lat/lon with different accuracy creating noisy data.
What is the most efficient way to analyze such data? Or maybe there is ways to make it “clean” as I collect it? Any suggestions?
Little bit open-ended question but I’d like any ideas you can give me 🙂

Every record defines a centroid (a location and a fuzzy space around it—size of the fuzz defined by accuracy) where the asset could actually be. Hopefully you can ignore altitude because that is much more fuzzy and typically the accuracy is only talking about horizontal accuracy.
Take the first point, assign it to a cluster (area/volume). Take the second point, see if it falls inside the cluster. If so, then you can either try to refine your centroid (average the old and new points) or just discard the second point. Refining the centroid is very tempting, but make sure you do it in a way which doesn’t allow very slow asset movement. When your next point falls outside the centroid, start up a new centroid and repeat.
You may find that your reported accuracy is better than the actual accuracy, in which case you can put in a stupidity multiplier or constant to make the centroid fuzzier than it actually is.
[EDIT]
OP asked how to do this in SQL. Well, I’m no SQL guru. The problem I’m having is that I cannot restrict the outer join to only provide those matches which are temporally contiguous instead of all matches which were at that location. Thus in my solution I am forced to use lots of extra-SQL loops.
This goes through the table locations by id. The first query gets the next (first) location information and the second query skips over all id which clustered to the same location. Rinse and repeat.