I’ll try to give this as generically as I can so it’s reusable.
I am running a site with a fairly large MySQL database which has grown to need some summary/rollup tables initialized. For the example’s sake, let’s say it’s soccer statistics. Since I handle multiple soccer leagues in the same database, many of them play games of different lengths – for instance, indoor soccer leagues play four quarters while most outdoor leagues play halves.
I have three tables important to this exercise. I’ve redacted all of the fields that I don’t consider significant to the answer I’m looking for.
GAME
`game`.id
`game`.home_team_id
`game`.away_team_id
`game`.number_of_periods
GOAL
// Records for each goal scored in the game
`goal`.id
`goal`.game_id
`goal`.team_id
`goal`.period_number
`goal`.player_id
`goal`.assist_player_id
PERIOD_SUMMARY
`period`.id
`period`.game_id
`period`.team_id
`period`.number
`period`.goals_scored
Ultimately I should have records for EVERY period played in the period summary table, regardless of whether or not a goal was scored. This table only needs to be initialized once, as it’s fairly easy to add the appropriate zero-filled records via a trigger on game creation and fire on insert/update requests to update the period_summary table.
It is also fairly easy for me to group all of the goals and initialize the period summary table with the SUM(), what I am having a bit of trouble figuring out an efficient way to “fill” any periods that don’t have a goal scored with a 0.
What I am trying to figure out is if it’s easier/more efficient to:
- Write the trigger and prefill the entire period_summary table with 0-filled values, then run the query I already know to update the appropriate records for periods in which goals were scored.
- Use some other method (perhaps a temporary stored procedure?) that will only 0-fill records where there is not a match in the goals table.
You already have a placeholder. The “placeholder for unknown data” in SQL is null.
You don’t need to pre-fill anything: either you have a row with some columns having an unknown value (null), or you have no row at all, so that doing an outer join will get a row that is all null. Either way, the attribute data (essentially, non-id fields) will be null.
And the
sum()aggregate will ignore nulls.So let’s say that you do have a row for a game (since it’s pre-scheduled), but no corresponding rows for its periods (since they have not yet been played). Then you do an outer join form game to period (outer, so that you include both games with and games without, period data):
This shows you the total goals (for both teams) by game; for games with no periods, you get back null (which means in SQL, “we don’t (yet) know”)
This query shows you only the total goals for completed games and games in progress (games for which at least one period has been played):
This view filters out incomplete games (assuming you always add early periods before later ones) :
Using that view, we can then sum only completed games:
So, no need to pre-fill, no need for a trigger, most importantly, no need to add false data (claiming zero goals when in fact the period has not yet been played), no need to update with correct data. Just insert the period when you have data for it.