I need to query for each minute the total count of rows up to that minute.
The best I could achieve so far doesn’t do the trick. It returns count per minute, not the total count up to each minute:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
Return only minutes with activity
Shortest
Use
date_trunc(), it returns exactly what you need.Don’t include
idin the query, since you want toGROUP BYminute slices.count()is typically used as plain aggregate function. Appending anOVERclause makes it a window function. OmitPARTITION BYin the window definition – you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined byORDER BY. The manual:And that happens to be exactly what you need.
Use
count(*)rather thancount(id). It better fits your question ("count of rows"). It is generally slightly faster thancount(id). And, while we might assume thatidisNOT NULL, it has not been specified in the question, socount(id)is wrong, strictly speaking, because NULL values are not counted withcount(id).You can’t
GROUP BYminute slices at the same query level. Aggregate functions are applied before window functions, the window functioncount(*)would only see 1 row per minute this way.You can, however,
SELECT DISTINCT, becauseDISTINCTis applied after window functions.ORDER BY 1is just shorthand forORDER BY date_trunc('minute', "when")here.1is a positional reference reference to the 1st expression in theSELECTlist.Use
to_char()if you need to format the result. Like:Fastest
Much like the above, but:
I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without
DISTINCTin the outerSELECT.Use
sum()as window aggregate function now to add up the counts from the subquery.I found this to be substantially faster with many rows per minute.
Include minutes without activity
Shortest
@GabiMe asked in a comment how to get eone row for every
minutein the time frame, including those where no event occured (no row in base table):Generate a row for every minute in the time frame between the first and the last event with
generate_series()– here directly based on aggregated values from the subquery.LEFT JOINto all timestamps truncated to the minute and count.NULLvalues (where no row exists) do not add to the running count.Fastest
With CTE:
Again, aggregate and count rows per minute in the first step, it omits the need for later
DISTINCT.Different from
count(),sum()can returnNULL. Default to0withCOALESCE.With many rows and an index on
"when"this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 – 9.4: