I need to answer a question like this:
For each user, what is the most items that user viewed in any 60 second
time frame betweenSTART_TIMESTAMPandEND_TIMESTAMP?
The 60 second time frame is a sliding window. It’s not just a matter of “items viewed” counts for each whole minute. Also, 60 seconds was just an example, it should work for any number of seconds.
My data is stored like this:
-- Timestamped log of users viewing items
CREATE TABLE user_item_views (
user_id integer,
item_id integer,
timestamp timestamp
);
Doing it for each whole minute is easy enough, just format timestamp to something like YYYY-MM-DD hh:mm and do a count grouped by that formatted timestamp and the user_id.
Doing it for a sliding window, I have no idea how to approach.
If this would be easier outside of SQL, I am open to exporting the data to another format, or using another language.
Desired output is something like:
User ID Max items viewed in N seconds, between START and END.
... ...
... ...
... ...
How can I do this?
Here’s how I would do it (beware, untested code, this ist just to outline the idea).
You need a helper table with as many rows as there are seconds between
START_TIMESTAMPandEND_TIMESTAMP. Create that as a temp table before you begin your query.For the sake of the sample, let’s call it
every_second. I’m assuming your minimum time resolution is one second.Then do:
Store that in another temporary table and select the desired maxima from it (this is necessary because of the “select max from group” problem).