I have a table emp with following structure and data:
name dept salary
----- ----- -----
Jack a 2
Jill a 1
Tom b 2
Fred b 1
When I execute the following SQL:
SELECT * FROM emp GROUP BY dept
I get the following result:
name dept salary
----- ----- -----
Jill a 1
Fred b 1
On what basis did the server decide return Jill and Fred and exclude Jack and Tom?
I am running this query in MySQL.
Note 1: I know the query doesn’t make sense on its own. I am trying to debug a problem with a ‘GROUP BY’ scenario. I am trying to understand the default behavior for this purpose.
Note 2: I am used to writing the SELECT clause same as the GROUP BY clause (minus the aggregate fields). When I came across the behavior described above, I started wondering if I can rely on this for scenarios such as:
select the rows from emp table where the salary is the lowest/highest in the dept.
E.g.: The SQL statements like this works on MySQL:
SELECT A.*, MIN(A.salary) AS min_salary FROM emp AS A GROUP BY A.dept
I didn’t find any material describing why such SQL works, more importantly if I can rely on such behavior consistently. If this is a reliable behavior then I can avoid queries like:
SELECT A.* FROM emp AS A WHERE A.salary = (
SELECT MAX(B.salary) FROM emp B WHERE B.dept = A.dept)
Read MySQL documentation on this particular point.
In a nutshell, MySQL allows omitting some columns from the GROUP BY, for performance purposes, however this works only if the omitted columns all have the same value (within a grouping), otherwise, the value returned by the query are indeed indeterminate, as properly guessed by others in this post. To be sure adding an ORDER BY clause would not re-introduce any form of deterministic behavior.
Although not at the core of the issue, this example shows how using * rather than an explicit enumeration of desired columns is often a bad idea.
Excerpt from MySQL 5.0 documentation: