I have a huge sql table (more than 1 billion) of user transactions.
I’d like to add a binary column which represents where or not the current user_id row is 40 minutes or less than the previous one.
For instance:
user_id | date
--------+--------------------
1 | 2011-01-01 12:15:00
1 | 2011-01-01 12:00:00
8 | 2011-01-01 15:00:00
8 | 2011-01-01 14:00:00
the result of the query would be:
user_id | date | new
--------+---------------------+----
1 | 2011-01-01 12:15:00 | 0
1 | 2011-01-01 12:00:00 | 1
8 | 2011-01-01 15:00:00 | 1
8 | 2011-01-01 14:00:00 | 1
I’d like to avoid joining the entire table to itself
and maybe use a side table or an analytic function (over-partition).
It assumes that
dateis a timestamp column despite its name.It’s the only way I can see. An index on (user_id, date) might speed things up – especially on 9.2 where this could qualify for an index only scan. But this is going to scan the whole table (or the maybe only the index on 9.2)
Btw: it’s not a good idea to name a column with a reserved word (
date). Additionallydateis a very poor name from a documentation point of view.