I’m looking for a fast way to create cumulative totals in a large SQL Server 2008 data set that partition by a particular column, potentially by using a multiple assignment variable solution. As a very basic example, I’d like to create the “cumulative_total” column below:
user_id | month | total | cumulative_total
1 | 1 | 2.0 | 2.0
1 | 2 | 1.0 | 3.0
1 | 3 | 3.5 | 8.5
2 | 1 | 0.5 | 0.5
2 | 2 | 1.5 | 2.0
2 | 3 | 2.0 | 4.0
We have traditionally done this with correlated subqueries, but over large amounts of data (200,000+ rows and several different categories of running total) this isn’t giving us ideal performance.
I recently read about using multiple assignment variables for cumulative summing here:
http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/06/cumulative-totals-screencast.aspx
In the example in that blog the cumulative variable solution looks like this:
UPDATE my_table
SET @CumulativeTotal=cumulative_total=@CumulativeTotal+ISNULL(total, 0)
This solution seems brilliantly fast for summing for a single user in the above example (user 1 or user 2). However, I need to effectively partition by user – give me the cumulative total by user by month.
Does anyone know of a way of extending the multiple assignment variable concept to solve this, or any other ideas other than correlated subqueries or cursors?
Many thanks for any tips.
Your options in SQL Server 2008 are reasonably limited – in that you can either do something based on the method as above (which is called a ‘quirky update’) or you can do something in the CLR.
Personally I would go with the CLR because it’s guaranteed to work, while the quirky update syntax isn’t something that’s formally supported (so might break in future versions).
The variation on quirky update syntax you’re looking for would be something like:
It’s worth noting that in SQL Server 2012 introduces
RANGEsupport to windowing functions, and so this is expressible in a way that is the most efficient, while being 100% supported.