I was trying to optimize NOT IN clause in mysql: Some how I ended up in the following query:
SELECT @i:=(SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
SELECT * FROM word WHERE @i IS NULL OR word_id NOT IN (@i);
There is no relationship between sent_question table and word table. And also I cannot place index on correct_option_word_id.
Can somebody please explain, will this method even optimize the query or not?
UPDATE: As mentioned here that both the methods: NOT IN and LEFT JOIN/IS NULL are almost equally efficient. That’s why I don’t want to use LEFT JOIN/IS NULL method.
UPDATE 2:
Explain results for original query:
EXPLAIN SELECT * FROM word WHERE word_id NOT IN (SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| 1 | PRIMARY | word | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 2 | DEPENDENT SUBQUERY | sent_question | ref | fk_question_subscriber1 | fk_question_subscriber1 | 48 | const | 1 | Using where |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
You’re right in that both the
NOT INandLEFT JOIN/IS NULLmethod are equally efficient, however, unfortunately, there is no faster option, only slower ones (NOT EXISTS).Here’s your query, simplified:
As you know, MySQL will do the subquery first and use the returned result set for the
NOT INclause. Then, it will scan through all of the rows inwordto see ifword_idis in the list for each row.Unfortunately for this case, indexes are inclusive, not exclusive. They don’t help with
NOTqueries. A covering index onwordcould potentially still be used to avoid accessing the actual table, and provide some IO benefits, but it won’t be used in the traditional “lookup” sense. However, since you are returning all columns on thewordtable, it may not be viable to have such a large index.The most important index that will be used here is an index on
sent_question.msisdnfor the subquery. Ensure that you have that index defined. A multi-column “covering” index on(msisdn, correct_option_word_id)would be best.If you share your design, we can probably offer some design solutions for optimization.