i have to show running total with the total column in my application … so i have used the following queries for finding the running total… and i find that both are working as per my need . in one i used the left join with group by and in another one i used the sub query .
and now my question is which one is faster when my data grow in thousands daily and if data will be in limit of 1000 or 2000 rows then which one is better … and any other method by which is more faster then these two ????
declare @tmp table(ind int identity(1,1),col1 int)
insert into @tmp
select 2
union
select 4
union
select 7
union
select 5
union
select 8
union
select 10
SELECT t1.col1,sum( t2.col1)
FROM @tmp AS t1 LEFT JOIN @tmp t2 ON t1.ind>=t2.ind
group by t1.ind,t1.col1
select t1.col1,(select sum(col1) from @tmp as t2 where t2.ind<=t1.ind)
from @tmp as t1
A great resource on calculating running totals in SQL Server is this document by Itzik Ben Gan that was submitted to the SQL Server Team as part of his campaign to have the
OVERclause extended further from its initial SQL Server 2005 implementation. In it he shows how once you get into tens of thousands of rows cursors out perform set based solutions. SQL Server 2012 did indeed extend theOVERclause making this sort of query much easier.As you are on SQL Server 2005 however this is not available to you.
Adam Machanic shows here how the CLR can be used to improve on the performance of standard TSQL cursors.
For this table definition
I create tables with both 2,000 and 10,000 rows in a database with
ALLOW_SNAPSHOT_ISOLATION ONand one with this setting off (The reason for this is because my initial results were in a DB with the setting on that led to a puzzling aspect of the results).The clustered indexes for all tables just had 1 root page. The number of leaf pages for each is shown below.
I tested the following cases (Links show execution plans)
The reason for inclusion of the additional CTE option was in order to provide a CTE solution that would still work if the
indcolumn was not guaranteed sequential.All of the queries had a
CAST(col1 AS BIGINT)added in order to avoid overflow errors at runtime. Additionally for all of them I assigned the results to variables as above in order to eliminate time spent sending back results from consideration.Results
Both the correlated subquery and the
GROUP BYversion use “triangular” nested loop joins driven by a clustered index scan on theRunningTotalstable (T1) and, for each row returned by that scan, seeking back into the table (T2) self joining onT2.ind<=T1.ind.This means that the same rows get processed repeatedly. When the
T1.ind=1000row is processed the self join retrieves and sums all rows with anind <= 1000, then for the next row whereT1.ind=1001the same 1000 rows are retrieved again and summed along with one additional row and so on.The total number of such operations for a 2,000 row table is 2,001,000, for 10k rows 50,005,000 or more generally
(n² + n) / 2which clearly grows exponentially.In the 2,000 row case the main difference between the
GROUP BYand the subquery versions is that the former has the stream aggregate after the join and so has three columns feeding into it (T1.ind,T2.col1,T2.col1) and aGROUP BYproperty ofT1.indwhereas the latter is calculated as a scalar aggregate, with the stream aggregate before the join, only hasT2.col1feeding into it and has noGROUP BYproperty set at all. This simpler arrangement can be seen to have a measurable benefit in terms of reduced CPU time.For the 10,000 row case there is an additional difference in the sub query plan. It adds an eager spool which copies all the
ind,cast(col1 as bigint)values intotempdb. In the case that snapshot isolation is on this works out more compact than the clustered index structure and the net effect is to reduce the number of reads by about 25% (as the base table preserves quite a lot of empty space for versioning info), when this option is off it works out less compact (presumably due to thebigintvsintdifference) and more reads result. This reduces the gap between the sub query and group by versions but the sub query still wins.The clear winner however was the Recursive CTE. For the “no gaps” version logical reads from the base table are now
2 x (n + 1)reflecting thenindex seeks into the 2 level index to retrieve all of the rows plus the additional one at the end that returns nothing and terminates the recursion. That still meant 20,002 reads to process a 22 page table however!Logical work table reads for the recursive CTE version are very high. It seems to work out at 6 worktable reads per source row. These come from the index spool that stores the output of the previous row then is read from again in the next iteration (good explanation of this by Umachandar Jayachandran here). Despite the high number this is still the best performer.