I got the following problem with my MySQL 5.5 DB – i am pretty new to this so it might be very obvious whats wrong but i just cant seem to get it:
Two tables:
Table1
CREATE TABLE `sequence_matches` (
`Sample_ID` INT(6) NOT NULL,
`Sequence_Match_ID` INT(8) NOT NULL,
`Start` INT(6) NULL DEFAULT NULL,
`End` INT(6) NULL DEFAULT NULL,
`Coverage` DOUBLE(5,2) NULL DEFAULT NULL,
`Frag_String` VARCHAR(255) NULL DEFAULT NULL,
`rms_mass_error_prod` DOUBLE(10,4) NULL DEFAULT NULL,
`rms_rt_error_prod` DOUBLE(10,4) NULL DEFAULT NULL,
PRIMARY KEY (`Sample_ID`, `Sequence_Match_ID`)
)
and
Table 2
CREATE TABLE `peptide_identifications` (
`Sample_ID` INT(6) NOT NULL,
`Peptide_identification_ID` INT(8) NOT NULL,
`Mass_error` DOUBLE(10,4) NULL DEFAULT NULL,
`Mass_error_ppm` DOUBLE(10,4) NULL DEFAULT NULL,
`Score` DOUBLE(10,4) NULL DEFAULT NULL,
`Type` VARCHAR(45) NULL DEFAULT NULL,
`global_pept_ID` INT(8) NOT NULL,
PRIMARY KEY (`Sample_ID`, `Peptide_identification_ID`),
INDEX `Index` (`global_pept_ID`)
)
each of them contains ~15 million rows.
Now, i want all those rows from Table2 where global_pept_id = 27443 and then use the peptide_identification_id of those, to query all information from Table1 where peptide_identification_id = sequence_match_id.
I tried the following statement:
SELECT * from sequence_matches
JOIN (
SELECT peptide_identification_id
FROM peptide_identifications
WHERE global_pept_id = 27443
) as tmp_pept
ON sequence_match_id = peptide_identification_id;
Here the Explain for that query:
Now this query is very, very slow (i actually never finished it, stopepd it after ~10min) and i can imagine it’s because there is no Index used for the second table although both ID’s are primary key and thus they should be indexed right?
The results for the inner select require ~3 sek and return ~3k rows if performed alone. So the i think the problem is making 3000 * 15mio compares cause every row is checked in Table2.
But how do i fix this?
any help appreciated
-voiD

Slightly different than other solutions. Consider the primary criteria you are trying to get first… those peptide elements for a given global peptide value. Ensure you have an index on this table on any such criteria you may be querying against (which you have). However, if you find you will be querying on more than one WHERE condition against the same table, try to prepare/have an index that will help on BOTH criteria.
Then, put a JOIN condition to the other table on the PK/FK relationship to get those records.
Without having proper indexes can significantly kill a query’s performance. Your Sequence_Matches table should have an index on just (Sequence_match_ID) to help its optimization. Having it in the second position (after the sample_id), will not benefit as expected.