i have a set of login data for a user_id with a time stamp.
a user could login multiple times but we need to return records at least an hour apart from one another, starting from the min record. the deduping has to happen at a user level (there can be multiple users)
for eg.
- user1 2012-03-07 14:24:30.000
- user1 2012-03-07 14:34:30.000
- user1 2012-03-07 15:14:30.000
- user1 2012-03-07 15:20:30.000
- user1 2012-03-07 15:30:30.000
- user1 2012-03-08 09:20:30.000
- user1 2012-03-08 09:50:30.000
- user1 2012-03-08 10:30:30.000
- user2 2012-03-07 15:20:30.000
i would only want to see the following records
- user1 2012-03-07 14:24:30.000
- user1 2012-03-07 15:30:30.000
- user1 2012-03-08 09:20:30.000
- user1 2012-03-08 10:30:30.000
- user2 2012-03-07 15:20:30.000
========================================================================
is there any way to do this in a clean way? we could do this recursively but i was hoping there might be a way to use row_number partition by.
any help is much appreciated!!
In Sql Server 2005 or newer this CTE will return table of LoginAt datetimes removing the ones less than hour apart from already selected LoginAts.
Crucial part is row_number(). As Sql Server does not allow neither aggregate functions nor top predicate, row_number() is the only way (IMO) to order loginAt datetimes and keep only first one.
Sql Fiddle playground is this way.
UPDATE:
Row numbers are applies for each generation individually. Extract from WITH common_table_expression (Transact-SQL):