Recently in the code of my collegue I saw an sql query, where she used GROUP BY with lots of columns. Most of these columns needn’t be grouped in the query. She has done this to prevent this error:
Column ‘some_col’ is invalid in the select list because it is
not contained in either an aggregate function or the GROUP BY clause.
I was wondering how heavy GROUP BY is, and is it ok to use such statements? If it is heavy than I’d better optimize her the query cause now I work on that piece of code.
It is hard to tell for sure without seeing the specific query, but I used to achieve surprising performance gains (at leas in SQL2K) by minimizing number of columns included in GROUP BY, and resolving those columns back with join on the inner query. To be more specific: let’s assume you have classing OrderDetails (OrderID, ProductID, Quantity, Price) and Products (ProductID, ProductName) tables. Changing this query:
to this:
would give me performance gain despite two joins to the same table, because grouping on integer index was faster then grouping on text-valued, unsorted product name.