I need to write a query that will group a large number of records by periods of time from Year to Hour.
My initial approach has been to decide the periods procedurally in C#, iterate through each and run the SQL to get the data for that period, building up the dataset as I go.
SELECT Sum(someValues) FROM table1 WHERE deliveryDate BETWEEN @fromDate AND @ toDate
I’ve subsequently discovered I can group the records using Year(), Month() Day(), and datepart(week, date) and datepart(hh, date).
SELECT Sum(someValues) FROM table1 GROUP BY Year(deliveryDate), Month(deliveryDate), Day(deliveryDate)
My concern is that using datepart in a group by will lead to worse performance than running the query multiple times for a set period of time due to not being able to use the index on the datetime field as efficiently; any thoughts as to whether this is true?
Thanks.
As with anything performance related Measure
Checking the query plan up for the second approach will tell you any obvious problems in advance (a full table scan when you know one is not needed) but there is no substitute for measuring. In SQL performance testing that measurement should be done with appropriate sizes of test data.
Since this is a complex case, you are not simply comparing two different ways to do a single query but comparing a single query approach against a iterative one, aspects of your environment may play a major role in the actual performance.
Specifically