I have a database table containing time-periods and amounts. Think of them as contracts with a duration and a price per day:
start | end | amount_per_day
2013-01-01 | 2013-01-31 | 100
2013-02-01 | 2013-06-30 | 200
2013-01-01 | 2013-06-30 | 100
2013-05-01 | 2013-05-15 | 50
2013-05-16 | 2013-05-31 | 50
I would like to make a query that will display the totals for each period, i.e.:
From 2013-01-01 to 2013-01-31, the first and third contract are active, so the total amount per day is 200. From 2013-02-01 to 2013-04-30, the second and third row are active, so the total is 300. From 2013-05-01 to 2013-05-15 the second, third and fourth row are active, so the total is 350. From 2013-05-16 to 2013-05-31 the second, third and fifth row are active, so the total is again 350. Finally, from 2013-06-01 to 2013-06-30 only the second and third are active, so the total is back to 300.
start | end | total_amount_per_day
2013-01-01 | 2013-01-31 | 200
2013-02-01 | 2013-04-30 | 300
2013-05-01 | 2013-05-31 | 350
2013-06-01 | 2013-06-30 | 300
(It is not necessary to detect that the intervals 2013-05-01 -> 2013-05-15 and 2013-05-16 -> 2013-05-31 have the same totals and merge them, but it would be nice).
I would prefer a portable solution, but if it is not possible a SQL Server will work, too.
I can make small changes to the structure of the table, so if it would make the query simpler to e.g. notate the time-periods with the end-date exclusive (so the first period would be start = 2013-01-01, end = 2013-02-01) feel free to make such suggestions.
I’ll start with the full query and then break it down and explain it. This is SQL-Server specific, but with minor tweaks could be adapted to any DMBS that supports analytical functions.
The
DataCTE is just your sample data.The
NumbersCTE is just a sequence of numbers from 0 – 2047 (If your start and end dates are more than 2047 days apart this will fail and will need adapting slightly)The Next CTE
DailyDatasimply uses the numbers to expand your ranges into their individual dates, soBecomes
Then it is just a case of grouping the data by the amount per day with the help of the ROW_NUMBER function to find when it changes and define ranges of similar amounts per day, then getting the MIN and MAX date for each range.
I always struggle to explain/demonstrate the exact workings of this method of grouping ranges, if it doesn’t make sense it is perhaps easiest seen for your self if you just use
SELECT * FROM DailyDataat the end to see the raw unaggregated data