I have the following tables:
create temporary table Items (item_id int, item_name varchar(10));
create temporary table ItemRating (item_id int, rating int);
With the following data:
insert into Items (item_id, item_name) values (1,'Item 1'),(2,'Item 2'),(3,'Item 3'),(4,'Item 4'),(5,'Item 5');
insert into ItemRating values (1,9),(1,6),(3,10);
And I run the following query:
select i.item_id, i.item_name, avg(ir.rating) from Items i left join ItemRating ir ON ir.item_id = i.item_id group by ir.item_id;
This is the result I get:
+---------+-----------+----------------+
| item_id | item_name | avg(ir.rating) |
+---------+-----------+----------------+
| 2 | Item 2 | NULL |
| 1 | Item 1 | 7.5000 |
| 3 | Item 3 | 10.0000 |
+---------+-----------+----------------+
Now, I fully understand that the query is written wrong, what I want is to be doing a group by on i.item_id. But I don’t understand the behavior. Why does MYSQL display item_id 2 in the results, but not 4 or 5? I would actually expect to see only items 1 & 3 because they’re the only ones with a corresponding record in ItemRating.
So, can anyone explain to me what MYSQL is doing here?
Here’s what’s going on. Consider the query piece by piece and what MySQL is processing as it goes.
First, you’re selecting from items (
select i.item_id, i.item_name, avg(ir.rating) from Items i):Then you are left joining the ratings (
left join ItemRating ir ON ir.item_id = i.item_id). Note that Item 1 appears in two rows after the join, because that’s how JOIN is defined to work — it returns one row for every join condition match (and the LEFT basically means “return every row in the first table at least once even if there are no join condition matches on that row”).Finally, you are grouping by rating (
group by ir.item_id). This will return one row for each unique ir.item_id. There are three unique ir.item_ids (as you can see in the last column there):1, andNULL, and3. For each of these, it returns one row and averages the rating.So, for
1we have:Which collapses into:
For
NULLwe have:Which collapses into:
For
3we have:Which collapses into:
Combining the three collapsed results gives:
Which is what you got.
The one tricky part is the way the NULL rows collapsed. Recall, these were the null rows:
When you do a group by, most database systems will not even let you select columns that are not part of the group. MySQL is an exception. Since you are only grouping on ir.rating, that’s the only one most would let you select, because there is no clear way to collapse three rows in a non-aggregate way. What MySQL does is just choose the first one it encounters and use the values in that row as the collapsed value. So (2,4,5) => (2) and (Item 2, Item 4, Item 5) => Item 2 and (NULL, NULL, NULL) => NULL. That’s why you only see row 2 (you are actually seeing three collapsed rows that look like row 2).
To really see this in action and drive the point home, consider this query:
This is just like your original query except all three selected columns now have group aggregate functions. I am using
GROUP_CONCAT, which just concatenates strings to form the collapsed version (this would be valid in other SQL systems besides MySQL). That returns this: