The MySQL documents state in section 11.5.3 that, despite what the SQL standard may say, it’s just fine to use columns in the SELECT clause that aren’t in the GROUP BY clause, so long as they are functionally dependent on the grouped key.
MySQL extends the use of GROUP BY so
that you can use nonaggregated columns
or calculations in the select list
that do not appear in the GROUP BY
clause. You can use this feature to
get better performance by avoiding
unnecessary column sorting and
grouping. For example, you need not
group on customer.name in the
following query:SELECT order.custid, customer.name, MAX(payments) FROM order,customer WHERE order.custid = customer.custid GROUP BY order.custid;In standard
SQL, you would have to add
customer.name to the GROUP BY clause.
In MySQL, the name is redundant.
Sounds reasonable. However, though I can select those columns, it seems to have an adverse effect on performance.
EXPLAIN SELECT o.id FROM objects o GROUP BY o.id;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | o | range | NULL | PRIMARY | 3 | NULL | 5262 | Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
(I realize that this query is pretty silly; it’s just the simplest version of a more complex query that has the same issue.) When selecting just the primary key ID I group by, then MySQL uses the primary key index. However, when I include other columns, MySQL does not.
EXPLAIN SELECT o.id, o.name FROM objects o GROUP BY o.id;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 5261 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
That use of filesort instead of the index really sets me back. I’m currently looking to select * from this table, so would like to avoid having to repeat all columns in the group and index them. Is there any way to get MySQL to use the primary key index, as I expect it to?
Since it doesn’t look like there’s a simple answer, I’m going with a cheap solution for the moment.
What I would do would be something like the following:
However, according to how it gets
EXPLAINed, the MySQL optimizer views the subquery as being dependent, which is always a really, really nasty performance killer. I think that’s a bug in the query optimizer brought about by the fact that it’s the same table, even though it’s aliased. As such, I’ll be using one query to fetch the IDs, and putting themINthe second query that fetcheso.*. It gets reasonable performance, and isn’t too painful.This question is still open to answers with cleaner solutions that perform as well, if not better 🙂