I recently read that with innodb tables, putting an index on (something,primary_key) is redundant, as the primary key is automatically clustered with all secondary indexes.
So to decrease my index size, I copied my table, removed the redundant primary key, and did some test queries, and I’m finding it is not behaving the same as my original table with the “redundant” primary key.
Explain tells me it is doing an intersect:
Using intersect(idx_faver_idx_id,PRIMARY);
Below is the query. If I remove “AND Favorite.id < 25103182″ then it works as expected and uses the correct index (idx_faver_idx_id).
SELECT `Item`.`id`, `Item`.`cached_image`, `Item`.`submitter_id`, `Item`.`source_title`, `Item`.`source_url`, `Item`.`source_image`, `Item`.`nudity`, `Item`.`tags`, `Item`.`width`, `Item`.`height`, `Item`.`tumblr_id`, `Item`.`tumblr_reblog_key`, `Item`.`fave_count`, `Favorite`.`id`, `Favorite`.`created`
FROM `favorites2` AS `Favorite`
LEFT JOIN `items` AS `Item`
ON (`Favorite`.`notice_id` = `Item`.`id`)
WHERE `faver_profile_id` = 1
AND `Favorite`.`removed` = 0
AND `Item`.`removed` = '0'
AND `Favorite`.`id` < 25103182
ORDER BY `Favorite`.`id` desc
LIMIT 26
An InnoDB secondary index leaf node includes the primary key values, but if you want to do a range query on the ID value, then it needs the non-leaf nodes of the index to include the primary key values.
If you only select the ID in your select-list, then then it’s redundant to add the primary key to the index definition. For example:
Either index would make the following query an index-only query. InnoDB prefers the more compact index
s. It can still be an index-only query because the leaf nodes of the index provide the ID value.But in the case when you also have an inequality or range condition on ID, it would get more benefit from an index that includes ID values in the non-leaf nodes as well. It can take advantage of the fact that ID values are sorted in the B-tree.
PS: Please don’t use the term “clustered” when describing a compound index, because clustered means something different with respect to indexes. A clustered index alters the storage of table data to match the order of the index. InnoDB primary keys are always a clustered index, in that the row of data is stored in the leaf node of the primary key index.
Re your comment: Keep in mind that a “range” query against the primary index can be superior to a “ref” query against a secondary index.
When your query uses a secondary index, it basically has to make two tree traversals per row: first to search the secondary index to get to a leaf node where it finds the primary key value, then second to use that primary key value to search the primary (clustered) index to get the rest of the columns.
It could be less expensive overall for your query to do a range query against the primary index, so it finds a small enough subset of rows and then applies your other conditions to the columns it finds. It isn’t using the secondary index, but it’s still a win because it only had to do one tree traversal per row.
I say “could be” not to use weasel words, but because the better choice really depends on how many rows are matched by either condition. Usually the optimizer is pretty good at making this evaluation, so it’s unnecessary to use FORCE INDEX to override its behavior.