I’m trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time.
Here’s the query I have so far:
SELECT p.name, e.date,
sum(sp.payout) OVER (ORDER BY e.date)
- sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss"
FROM result r
JOIN game g ON r.game_id = g.game_id
JOIN event e ON g.event_id = e.event_id
JOIN structure s ON g.structure_id = s.structure_id
JOIN structure_payout sp ON g.structure_id = sp.structure_id
AND r.position = sp.position
JOIN player p ON r.player_id = p.player_id
WHERE p.player_id = 17
GROUP BY p.name, e.date, e.event_id, sp.payout, s.buyin
ORDER BY p.name, e.date ASC
The query will run. However, the result is slightly incorrect. The reason is that an event can have multiple games (with different sp.payouts). Therefore, the above comes out with multiple rows if a user has 2 results in an event with different payouts (i.e. there are 4 games per event, and a user gets £20 from one, and £40 from another).
The obvious solution would be to amend the GROUP BY to:
GROUP BY p.name, e.date, e.event_id
However, Postgres complains at this as it doesn’t appear to be recognizing that sp.payout and s.buyin are inside an aggregate function. I get the error:
column “sp.payout” must appear in the GROUP BY clause or be used in an
aggregate function
I’m running 9.1 on Ubuntu Linux server.
Am I missing something, or could this be a genuine defect in Postgres?
You are not, in fact, using aggregate functions. You are using window functions. That’s why PostgreSQL demands
sp.payoutands.buyinto be included in theGROUP BYclause.By appending an
OVERclause, the aggregate functionsum()is turned into a window function, which aggregates values per partition while keeping all rows.You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove
sp.payoutands.buyinfrom theGROUP BYclause and get one row perplayerandevent:In this expression:
sum(sum(sp.payout)) OVER w, the outersum()is a window function, the innersum()is an aggregate function.Assuming
p.player_idande.event_idarePRIMARY KEYin their respective tables.I added
e.event_idto theORDER BYof theWINDOWclause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also includedevent_idin the result to distinguish multiple events per day.While the query restricts to a single player (
WHERE p.player_id = 17), we don’t need to addp.nameorp.player_idtoGROUP BYandORDER BY. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping byp.namecould not repair the query then.I also removed
e.datefrom theGROUP BYclause. The primary keye.event_idcovers all columns of the input row since PostgreSQL 9.1.If you change the query to return multiple players at once, adapt:
Unless
p.nameis defined unique (?), group and order byplayer_idadditionally to get correct results in a deterministic sort order.I only kept
e.dateandp.nameinGROUP BYto have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for juste.datein the first query.)