Today noticed one SQL query that was extremely long in my mysql-slow.log
I would like to ask some SQL experts how to correctly format and perform this SQL.
Idea behind sql:
Return all emails that are not in mailchimp table while doing it from 2 tables and return only DISTINCT values (users and subscribers emails might duplicate). Also including city and language with results.
As you can see query_time is monster long and rows examined are just wtf combined 2 tables there should be only around 20k rows.
Query_time: 113.216544 Lock_time: 0.000180 Rows_sent: 43 Rows_examined: 208280841
SELECT * FROM
( SELECT u.email AS email, u.city, u.language FROM users AS u
LEFT JOIN mailchimp AS m ON u.email = m.email WHERE m.email IS NULL GROUP BY u.email
UNION SELECT s.email AS email, s.city, s.language FROM subscribers AS s
LEFT JOIN mailchimp AS m ON s.email = m.email WHERE m.email IS NULL GROUP BY s.email )
AS sync GROUP BY sync.email ORDER BY sync.email ASC;
EXPLAIN for query
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 23 | Using temporary; Using filesort |
| 2 | DERIVED | u | ALL | NULL | NULL | NULL | NULL | 10482 | Using temporary; Using filesort |
| 2 | DERIVED | m | ALL | NULL | NULL | NULL | NULL | 11411 | Using where; Not exists |
| 3 | UNION | s | ALL | NULL | NULL | NULL | NULL | 2709 | Using temporary; Using filesort |
| 3 | UNION | m | ALL | NULL | NULL | NULL | NULL | 11411 | Using where; Not exists |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
6 rows in set (2 min 1.65 sec)
I guess you have no indexes on the three tables. Add index on field
email, on all 3 tables;users,subscribersandmailchimpand run the query – and the EXPLAIN – again.Your query:
could be written like this (removing the two inner
GROUP BYand turningUNIONintoUNION ALL):or like this (turning the
LEFT JOIN - check IS NULLintoNOT EXISTS), which is sometimes faster:In any case, add indexes to the
emailfields!