This has bugged me for a long time.
99% of the time, the GROUP BY clause is an exact copy of the SELECT clause, minus the aggregate functions (MAX, SUM, etc.).
This breaks the Don’t Repeat Yourself principle.
When can the GROUP BY clause not contain an exact copy of the SELECT clause minus the aggregate functions?
edit
I realise that some implementations allow you to have different fields in the GROUP BY than in the SELECT (hence 99%, not 100%), but surely that’s a very minor exception?
Can someone explain what is supposed to be returned if you use different fields?
Thanks.
I tend to agree with you – this is one of many cases where SQL should have slightly smarter defaults to save us all some typing. For example, imagine if this were legal:
where ‘*’ meant ‘all the non-aggregate fields’. If everybody knew that’s how it worked, then there would be no confusion. You could sub in a specific list of fields if you wanted to do something tricky, but the splat means ‘all of ’em’ (which in this context means, all the possible ones).
Granted, ‘*’ means something different here than in the SELECT clause, so maybe a different character would work better:
There are a few other areas like that where SQL just isn’t as eloquent as it could be. But at this point, it’s probably too entrenched to make many big changes like that.