In the article Why Arel?, the author poses the problem:
Suppose we have a users table and a photos table and we want to select all user data and a *count* of the photos they have created.
His proposed solution (with a line break added) is
SELECT users.*, photos_aggregation.cnt
FROM users
LEFT OUTER JOIN (SELECT user_id, count(*) as cnt FROM photos GROUP BY user_id)
AS photos_aggregation
ON photos_aggregation.user_id = users.id
When I attempted to write such a query, I came up with
select users.*, if(count(photos.id) = 0, null, count(photos.id)) as cnt
from users
left join photos on photos.user_id = users.id
group by users.id
(The if() in the column list is just to get it to behave the same when a user has no photos.)
The author of the article goes on to say
Only advanced SQL programmers know how to write this (I’ve often asked this question in job interviews I’ve never once seen anybody get it right). And it shouldn’t be hard!
I don’t consider myself an “advanced SQL programmer”, so I assume I’m missing something subtle. What am I missing?
In most DBMSs (MySQL and Postgres are exceptions) the version in your question would be invalid.
You would need to write the query which does not use the derived table as
MySQL allows you to select non aggregated items that are not in the
group bylist but this is only safe if they are functionally dependant on the column(s) in thegroup by.Whilst the
group bylist is more verbose without the derived table I would expect most optimisers to be able to transform one to the other anyway. Certainly in SQL Server if it sees you are grouping by the PK and some other columns it doesn’t actually do group by comparisons on those other columns.Some discussion about this MySQL behaviour vs standard SQL is in Debunking GROUP BY myths