I am somewhat confused about how the group by command works in mysql.
Suppose I have a table:
mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | date | httpMethod |
+----------+-----------------+---------------------+-------------------------------------------------+
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 4 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/fcs_style.css HTTP/1.1 |
| 5 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/main_page.css HTTP/1.1 |
| 6 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/bigportaltopbanner.gif HTTP/1.1 |
| 7 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/right_1.jpg HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 9 | 64.68.88.165 | 2003-07-09 00:02:44 | GET /news/sports/basketball.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 |
| 12 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/fcs_style.css HTTP/1.1 |
| 13 | 129.173.159.98 | 2003-07-09 00:03:46 | GET /include/main_page.css HTTP/1.1 |
| 14 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/bigportaltopbanner.gif HTTP/1.1 |
| 15 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/left_1g.jpg HTTP/1.1 |
| 16 | 129.173.159.98 | 2003-07-09 00:03:48 | GET /images/webcam.gif HTTP/1.1 |
+----------+-----------------+---------------------+-------------------------------------------------+
When I am execute this statement how does it choose which recordID to include since there are a range of recordIDs that would be correct? Does it just choose the first one that matches?
mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | date | httpMethod |
+----------+-----------------+---------------------+-------------------------------------------------+
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | GET / HTTP/1.1 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
For this table the max(date) and min(date) values seem logical to me but I am confused about how the recordID and httpMethod where chosen.
Is it safe use two aggregate functions in one command?
mysql> select recordID, IPAddress, min(date), max(date), httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| recordID | IPAddress | min(date) | max(date) | httpMethod |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| 11 | 129.173.159.98 | 2003-07-09 00:03:46 | 2003-07-09 00:03:48 | GET / HTTP/1.1 |
| 3 | 129.173.177.214 | 2003-07-09 00:01:23 | 2003-07-09 00:01:23 | GET / HTTP/1.1 |
| 8 | 64.68.88.165 | 2003-07-09 00:02:43 | 2003-07-09 00:02:44 | GET /studentservices/responsible.shtml HTTP/1.0 |
| 2 | 64.68.88.166 | 2003-07-09 00:00:55 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0 |
| 1 | 64.68.88.22 | 2003-07-09 00:00:21 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0 |
| 10 | 64.68.88.34 | 2003-07-09 00:02:46 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0 |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
Usually use of GROUP BY while listing a field in the select expression without an aggregate function is invalid SQL and should throw an error.
MySQL, however, allows this and simply chooses one value randomly. Try to avoid it, because it is confusing.
To disallow this, you can say at runtime:
SET sql_mode := CONCAT('ONLY_FULL_GROUP_BY,',@@sql_mode);or use the configuration value and/or command line option
sql-mode.Yes, listing two aggregate functions is completely valid.