I’m trying to filter a relationship table down to get a subset of the table where two conditions are met (ie: I want all of the id’s of the entries who’s color_ids are 1 or 2). It’s a beefy table, so I’m trying to optimize as much as possible.
I was wondering if anyone could explain my finding in this case:
Why is
SELECT DISTINCT a.id
FROM RelationshipTable as a
JOIN RelationshipTable as b ON b.id = a.id
WHERE a.color_id = 1
AND b.color_id = 2;
faster than
SELECT DISTINCT id
FROM RelationshipTable
WHERE color_id = 1
OR color_id = 2;
in MySql 4.1?
The two are not the same query and should not be giving the same result set. In the first query you want all the records which meet both conditions, you have a record with a color_id = of 1 and a record with a color_id of 2 for the same ID. In the second query you will get all records that have both color ids and all records that have only one or the other. Of course since you are asking for a differnt field to be returned you might not see this. And the second query is somewhat silly anyway as it can be expressed as:
And never hit a table at all. That would make it super fast.