I have a dataset that analogously looks like this:
X | U | datetime
-------------
1 | 1 | 1/1/12
1 | 2 | 1/1/12
1 | 2 | 1/1/12
1 | 2 | 1/1/12
1 | 4 | 1/1/12
2 | 2 | 2/1/12
2 | 3 | 2/1/12
1 | 3 | 3/1/12
2 | 4 | 3/1/12
3 | 2 | 4/1/12
it is a log of visits. X is the id of the thing visited, and U is the user id
I need to compute two statistics.
Given a value for X (x):
- "new visitors": count the number of unique users that made their first visit for any X to
x.
use cases:
- A user has only visited
xonce -> counts as 1 - A user has only visited
!xonce -> counts as 0 - A user has only visited
xtwice -> counts as 1 - A user has only visited
!xtwice -> counts as 0 - A user has visited many Xs, where their first visit of any X is
x-> counts as 1 - A user has visited many Xs, where their first visit of any X is
!x-> counts as 0
Examples from above data:
X | Count
---------
1 | 3
2 | 1
3 | 0
- "returning visitors": count the number of unique users that have visited
xmore than once OR have visitedxonce, but have visited another X previously (i.e. visits made after their single visit toxdo not count)
Examples from above data:
X | Count
---------
1 | 3
2 | 2
3 | 1
I’m using SQL Server 2008.
Update
This appears to answer Q1, although it’s not very fast 🙁
select x.X, COUNT(1)
from (
select t1.X
from @t t1
group by t1.X, t1.U
having (select COUNT (1) from @t t2 where t2.u= t1.U and t2.OccurredOn < MIN(t1.OccurredOn)) =0
) x
group by x.X
Update 2
I think this is (2)
select t.X, COUNT(1)
from @t t
left join (
select t.U, MIN(t.OccurredOn) as O
from @t t
group by t.U
) x on t.U = x.U and t.OccurredOn <= x.O
where x.U is null
group by t.X
For the first case, you need a sub-query to join to that will filter out all user-thing visits that aren’t the first of their kind. So you’ll have something like
Edit: I think your solution to the second one is fine, except that the join will be faster if you replace the
<=with simply=.