I have a SQL query that takes a very long time to run on MySQL (it takes several minutes). The query is run against a table that has over 100 million rows, so I’m not surprised it’s slow. In theory, though, it should be possible to speed it up as I really only want to get back the rows from the large table (let’s call it A) that have a reference in another table, B.
So my query is:
SELECT id FROM A, B where A.ref = B.ref;
(A has over 100 million rows; B has just a few thousand).
I’ve added INDEXes:
alter table A add index(ref); alter table B add index(ref);
But it’s still very slow (several minutes — I’d be happy with one minute).
Unfortunately, I’m stuck with MySQL 4.1.22, so I can’t use views.
I’d rather not copy all of the relevant rows from A into a separate, smaller table, as the rows that I need will change from time to time. On the other hand, at the moment that’s the only solution I can think of.
Any suggestions welcome!
EDIT: Here’s the output of running EXPLAIN on my query:
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+ | 1 | SIMPLE | B | ALL | B_ref,ref | NULL | NULL | NULL | 16718 | Using where | | 1 | SIMPLE | A | ref | A_REF,ref | A_ref | 4 | DATABASE.B.ref | 5655 | | +----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
(In redacting my original query example, I chose to use ‘ref’ as my column name, which happens to be the same as one of the types, but hopefully that’s not too confusing…)
The query optimizer is probably already doing the best that it can, but in the unlikely event that it’s reading the giant table (A) first, you can explicitly tell it to read B first using the
STRAIGHT_JOINsyntax: