I’m having trouble optimizing this query:
SELECT a.id
FROM a
JOIN b ON a.id=b.id
LEFT JOIN c ON a.id=c.id
WHERE
(b.c1='12345' OR c.c1='12345')
AND (a.c2=0 OR b.c3=1)
AND a.c4='active'
GROUP BY a.id;
The query takes 7s, whereas it takes 0s when only one of b or c is JOINed. The EXPLAIN:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: a
type: ref
possible_keys: PRIMARY(id),c4,c2
key: c4
key_len: 1
ref: const
rows: 80775
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ref
possible_keys: id_c1_unique,id
key: id_c1
key_len: 4
ref: database.a.id
rows: 1
Extra: Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: b
type: ref
possible_keys: id_c1_unique,id,c1,c3
key: id
key_len: 4
ref: database.a.id
rows: 2
Extra: Using where
There is always exactly 1 matching row from b, and at most one matching row from c. It would go much faster if MySQL starting by getting the b and c rows that match the c1 literal, then join a based on id, but it starts with a instead.
Details:
- MyISAM
- All columns have indexes (_unique are UNIQUE)
- All columns are NOT NULL
What I’ve tried:
- Changing the order of the JOINs
- Moving the WHERE conditions to the ON clauses
- Subselects for
b.c1andc.c1(WHERE b.id=(SELECT b.id FROM b WHERE c1=’12345′)) - USE INDEX for
bandc
I understand I could do this using two SELECTs with a UNION but I need to avoid that if at all possible because of how the query is being generated.
Edit: Add CREATE TABLEs
CREATE TABLEs with the relevant columns.
CREATE TABLE `a` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`c2` tinyint(1) NOT NULL,
`c4` enum('active','pending','closed') NOT NULL,
PRIMARY KEY (`id`),
KEY `c2` (`c2`)
KEY `c4` (`c4`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `b` (
`b_id` int(11) NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL DEFAULT '0',
`c1` int(11) NOT NULL,
`c3` tinyint(1) NOT NULL,
PRIMARY KEY (`b_id`),
UNIQUE KEY `id_c1_unique` (`id`,`c1`),
KEY `c1` (`c1`),
KEY `c3` (`c3`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `c` (
`c_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL,
`c1` int(11) NOT NULL,
PRIMARY KEY (`c_id`),
UNIQUE KEY `id_c1_unique` (`id`,`c1`),
KEY `id` (`id`),
KEY `c1` (`c1`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
OP answering here.
What I’ve determined is that the behavior I’m seeing with MySQL reading the less efficient table first is an inherent issue with all LEFT JOINs where the less efficient table is on the left side. According to LEFT JOIN and RIGHT JOIN Optimization from the MySQL manual:
So:
will always read
afirst, even when the query plan shows that readingcis more efficient. Switching the tables causes MySQL to read fromcfirst:In my case both queries return the same results. Apparently there is something conceptual that I’m missing that requires the left side table to always be read first when doing a LEFT JOIN. It seems to me the right side table could just as easily be read first and MySQL could still generate the same results (for certain queries, not necessarily for all LEFT JOINs). If that were possible though that optimization probably would have been added long ago, so I guess I’m just missing the concept.
In the end switching the order of the tables wasn’t a good solution for me. I ended up merging
bandcinto a single table, which simplified the application and should have been done to begin with. With a single table I can do a JOIN instead of a LEFT JOIN, avoiding the issue altogether.Another possible solution might be creating a view that incorporates both tables, thereby giving a single view to JOIN from. I didn’t test that though.
TL;DR: Change the order of the tables to put the most efficient first (if the result set is the same regardless of the order). Or merge
bandcinto a single table. Or possibly create a view that combinesbandc.