I have a query:
select SQL_NO_CACHE id from users
where id>1 and id <1000
and id in ( select owner_id from comments and content_type='Some_string');
(note that it is short of an actual large query used for my sphinx indexing, representing the problem)
This query is taking about 3.5 seconds(modifying range from id = 1..5000 makes it about 15 secs).
users table has about 35000 entries and comments table has about 8000 entries.
Explain on above query:
explain select SQL_NO_CACHE id from users
where id>1 and id <1000
and id in ( select distinct owner_id from d360_core_comments);
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra |
| 1 | PRIMARY | users | range | PRIMARY
| PRIMARY | 4 | NULL | 1992 | Using where; Using index || 2 | DEPENDENT SUBQUERY | d360_core_comments | ALL | NULL | NULL | NULL | NULL | 6901 | Using where; Using temporary |
where the individual subquery(select owner_id from d360_core_comments where content_type='Community20::Topic';) here is taking almost 0.0 seconds.
However if I add index on owner_id,content_type, (note the order here)
create index tmp_user on d360_core_comments (owner_id,content_type);
My subquery runs as is in ~0.0 seconds with NO index used:
mysql> explain select owner_id from d360_core_comments where
content_type=’Community20::Topic’;| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | d360_core_comments | ALL | NULL | NULL
| NULL | NULL | 6901 | Using where |
However now my main query (select SQL_NO_CACHE id from users where id>1 and id <1000 and id in ( select owner_id from d360_core_comments where content_type='Community20::Topic');)
now runs in ~0 seconds with following explain:
mysql> explain select SQL_NO_CACHE id from users where id>1 and id
<1000 and id in ( select owner_id from d360_core_comments where
content_type=’Community20::Topic’);| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
| 1 | PRIMARY
| users | range | PRIMARY | PRIMARY | 4 | NULL | 1992 | Using where; Using index || 2 | DEPENDENT SUBQUERY |
d360_core_comments | index_subquery | tmp_user | tmp_user | 5 | func | 34 | Using where |
So the main questions I have are:
- If the index defined on the table used in my subquery is not getting used in my actual subquery then how it is optimizing the query here?
- And why in the first place the first query was taking so much time when the actual subquery and main query independently are much faster?
If you read this section of the MySQL Reference Manual: Optimizing Subqueries with
EXISTSStrategy, you’ll see that the query optimizer transforms your subquery condition from:into:
This is why a index on
(owner_id, content_type)is not useful when the subquery is tested as standalone query, but it is useful when considering the transformed subquery.