I have a MySQL 5.1 InnoDB table (customers) with the following structure:
int record_id (PRIMARY KEY)
int user_id (ALLOW NULL)
varchar[11] postcode (ALLOW NULL)
varchar[30] region (ALLOW NULL)
..
..
..
There are roughly 7 million rows in the table. Currently, the table is being queried like this:
SELECT * FROM customers WHERE user_id IN (32343, 45676, 12345, 98765, 66010, ...
in the actual query, currently over 560 user_ids are in the IN clause. With several million records in the table, this query is slow!
There are secondary indexes on table, the first of which being on user_id itself, which I thought would help.
I know that SELECT(*) is A Bad Thing and this will be expanded to the full list of fields required. However, the fields not listed above are more ints and doubles. There are another 50 of those being returned, but they are needed for the report.
I imagine there’s a much better way to access the data for the user_ids, but I can’t think how to do it. My initial reaction is to remove the ALLOW NULL on the user_id field, as I understand NULL handling slows down queries?
I’d be very grateful if you could point me in a more efficient direction than using the IN ( ) method.
EDIT
Ran EXPLAIN, which said:
select_type = SIMPLE
table = customers
type = range
possible_keys = userid_idx
key = userid_idx
key_len = 5
ref = (NULL)
rows = 637640
Extra = Using where
does that help?
First, check if there is an index on
USER_IDand make sure it’s used.You can do it with running
EXPLAIN.Second, create a temporary table and use it in a
JOIN:Third, how may rows does your query return?
If it returns almost all rows, then it just will be slow, since it will have to pump all these millions over the connection channel, to begin with.
NULLwill not slow your query down, since theINcondition only satisfies non-NULLvalues which are indexed.Update:
The index is used, the plan is fine except that it returns more than half a million rows.
Do you really need to put all these
638,000rows into the report?Hope its not printed: bad for rainforests, global warming and stuff.
Speaking seriously, you seem to need either aggregation or pagination on your query.