I have a table like this
CREATE TABLE values (
id int(10) auto_increment NOT NULL,
molecule_id int(5) NOT NULL,
descriptor_id int(5) NOT NULL,
T double DEFAULT NULL,
value double NOT NULL,
PRIMARY KEY (id),
KEY index1 (molecule_id, T),
KEY index2 (descriptor_id, T)
) ENGINE=InnoDB;
Rows of the table are many combinations of 3000 descriptor_ids, 600 molecule_ids and 3500 Ts with random double values (about 2 billion rows).
I was under the impression that for a query like
SELECT T, value FROM values WHERE molecule_id = X AND descriptor_id = Y
mysql would use both keys and then intersect the results. But doing an Explain extended on this query tells me it only uses index2, having chosen between index1 and index2.
molecule_id = X hits about 1/600 of the table.
descriptor_id = Y hits either a very small part of the table of the table (like 0.001%) or about 1/700, depending on Y.
It seems like intersecting would be faster than just using index2 and scanning the rest of the over ~2.5 million rows. Even if the 3000 descriptor_ids were evenly distributed it would still leave 800,000 rows to scan on average.
What am I missing?
spencer7593 has it right. An index_merge only occurs in range situations. If your
ANDwere anORit would trigger an index_merge. However, since it is anAND, why not make a multi_column index on bothmolecule_idanddescriptor_id? That will get you better results, and faster. Ifdescriptor_idis more exclusive (as you mentioned) do this:ALTER TABLE values ADD INDEX descriptor_molecule (descriptor_id, molecule_id, T, value)As long as your query has both columns in the where clause with an
ANDcondition, it will use this index. In this case, I would actually drop yourindex2, since if the query only has thedescriptor_idcolumn in the where clause, it can still use thedescriptor_moleculeindex as a prefix index. Plus, indexing all 4 columns will create a covering index for the query you mentioned and thus speed up your query by quite a bit.