Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns, for instance:
SELECT storeid, storename, SUM(revenue), COUNT(*)
FROM Sales
GROUP BY storeid, storename
It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.
SELECT (2 * (x + y)) / z + 1, MyFunction(x, y), SUM(z)
FROM AnotherTable
GROUP BY (2 * (x + y)) / z + 1, MyFunction(x, y)
If we ever change the SELECT statement, we must remember to make the same change to our GROUP BY clause.
So is the GROUP BY clause is redundant?
- If this is indeed the case, then why is there a GROUP BY clause in SQL at all?
- If this is not the case, then what extra functionality does GROUP BY give us?
This is not true in general. MySQL for example doesn’t require this, and the SQL standard doesn’t say this either.
Also not true in general. MySQL (and perhaps other databases too) allow column aliases to be used in the GROUP BY clause:
The only way of specifying what to group by is to use a GROUP BY clause. You cannot necessarily deduce it from the columns mentioned in the SELECT. In fact you don’t even have to select all the columns mentioned in the GROUP BY: