So, here it goes: I have a table that is returning timespans for a user ID. Unfortunately, it is returning duplicated timespans for some users (i.e. some users appear more than once in the results, because they have duplicated records in the query).
I want to select just the most recent timespan from the user results, for every user in the result set. I tried building a query that would SORT BY(), but that hasn’t proven fruitful. I think I am on the right track with that, but perhaps not.
Anyways, here is a sample summary of the output that I am trying to winnow out:
User Activity 1 Activity1 ID Activity 2 Activity2 ID
User 1 01-01-2009 - 12-31-2010 100.00 03-02-2009 - 05-05-2009 500.01
User 1 01-06-2009 - 12-31-2010 100.01 03-02-2009 - 05-05-2009 500.01
User 2 06-01-2009 - 12-31-2010 200.00 06-06-2010 - 03-03-2011 501.01
What I would like to do is return just the first ‘User 1’ category (or more specifically, the tuple with the longest timespan). I am using MS SQL Server (TSQL), and it doesn’t support Temporal data structures (yet), but should in 2012.
Any thoughts from the collective?
I believe you’re looking to “partition” your query.
These should help:
The OVER clause.
The ROW_NUMBER function.
Some interesting examples.