There is a table with visits data:
uid (INT) | created_at (DATETIME)
I want to find how many days in a row a user has visited our app. So for instance:
SELECT DISTINCT DATE(created_at) AS d FROM visits WHERE uid = 123
will return:
d
------------
2012-04-28
2012-04-29
2012-04-30
2012-05-03
2012-05-04
There are 5 records and two intervals – 3 days (28 – 30 Apr) and 2 days (3 – 4 May).
My question is how to find the maximum number of days that a user has visited the app in a row (3 days in the example). Tried to find a suitable function in the SQL docs, but with no success. Am I missing something?
UPD:
Thank you guys for your answers! Actually, I’m working with vertica analytics database (http://vertica.com/), however this is a very rare solution and only a few people have experience with it. Although it supports SQL-99 standard.
Well, most of solutions work with slight modifications. Finally I created my own version of query:
-- returns starts of the vitit series
SELECT t1.d as s FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
s
---------------------
2012-04-28 01:00:00
2012-05-03 01:00:00
-- returns end of the vitit series
SELECT t1.d as f FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
f
---------------------
2012-04-30 01:00:00
2012-05-04 01:00:00
So now only what we need to do is to join them somehow, for instance by row index.
SELECT s, f, DATEDIFF(day, s, f) + 1 as seq FROM (
SELECT t1.d as s, ROW_NUMBER() OVER () as o1 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', -1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl1 LEFT JOIN (
SELECT t1.d as f, ROW_NUMBER() OVER () as o2 FROM testing t1
LEFT JOIN testing t2 ON DATE(t2.d) = DATE(TIMESTAMPADD('day', 1, t1.d))
WHERE t2.d is null GROUP BY t1.d
) tbl2 ON o1 = o2
Sample output:
s | f | seq
---------------------+---------------------+-----
2012-04-28 01:00:00 | 2012-04-30 01:00:00 | 3
2012-05-03 01:00:00 | 2012-05-04 01:00:00 | 2
Another approach, the shortest, do a self-join:
Output:
Live test: http://www.sqlfiddle.com/#!1/93789/1
sr = second row, fr = first row ( or perhaps previous row? ツ ). Basically we are doing a back tracking, it’s a simulated lag on database that doesn’t support
LAG(Postgres supports LAG, but the solution is very long, as windowing doesn’t support nested windowing). So in this query, we uses a hybrid approach, simulate LAG via join, then use SUM windowing against it, this produces group numberUPDATE
Forgot to put the final query, the query above illustrate the underpinnings of group numbering, need to morph that into this:
UPDATE
I know why my other solution that uses window function became long, it became long on my attempt to illustrate the logic of group numbering and counting over the group. If I’d cut to the chase like in my MySql approach, that windowing function could be shorter. Having said that, here’s my old windowing function approach, albeit better now:
Live test: http://www.sqlfiddle.com/#!1/93789/21