I was wondering if someone could help me with some SQL for returning the amount of unique users logged into a database table during a period of two or more days (let’s use 7 days as a reference).
My log table contains a timestamp (ts) and user_id in each row, representing activity from that user at that time.
The following query returns the Daily Active Users or DAU from this log:
SELECT FLOOR(ts / 86400) AS day, COUNT(DISTINCT user_id) AS dau
FROM log
GROUP BY day ORDER BY day ASC
Now let’s say I would like to add to this single query (or at least retrieve in the most efficient possible fashion) the Weekly Active Users, or total unique users logged for a period of 7 days. However, I don’t want to divide my time in non-overlapping weeks. What I need is to count, for each day, the distinct user_ids seen during that day and the 6 previous days.
For example:
day users wau
1 1,2 2
4 1,3 3
7 3,4,5 5
8 5 4 (user_id 2 lost from count)
15 2 2 (user_ids 1,3,4 lost from count)
Thank you for any help you can provide and feel free to ask via comment if you need further clarification.
To get a “Weekly Average User” count (per my understanding of your specification… “for each day, the count of distinct user_ids seen during that day and the previous six days”), a query along the lines of the one below could be used. (The query also returns the “Daily Average User” count.
(I have not yet run a test of this; but I will later, and I will update this statement if any corrections are needed.)
This query is joining the list of users for a given day (from the
urowsource), to a set of days from the log table (thedrowsource). Note the literal “7” that appears in the join predicate (the ON clause), that’s what’s getting the user list “matched” to the previous 6 days.Note that this could also be extended to get the distinct user count over the past 3 days, for example, by adding another expression in the SELECT list.
That literal “7” could be increased to get a larger range. And that literal 3 in the expression above could be changed to get any number of days… we just need to be sure we’ve got enough previous day rows (from
d) joined to each row fromu.PERFORMANCE NOTE: Due to the inline views (or derived tables, as MySQL calls them), this query may not be very fast, since the resultsets for those inline views has to be materialized into intermediate MyISAM tables.
The inline view aliased as
umay not be optimal; it might be faster to join directly to the log table. I was thinking in terms of getting a unique list of users for a given day, which is what that query in the inline view got me. It was just easier for me to conceptualize what was going on. And I was thinking that if you had hundreds of the same user entered for day, the inline view would weed out a whole bunch of the duplicates, before we did the join to the other days.A WHERE clause to limit the number of days we are returning would be best added inside the
uanddinline views. (Thedinline view would need to include an extra earlier 6 days.)On another note, if ts column is TIMESTAMP datatype, I would be more inclined to use a
DATE(ts)expression to extract the date portion. But that would return a DATE datatype in the resultset, rather than an integer, which would be different from the resultset you specified.)