I’m trying to figure out how to optimize a very slow query in MySQL (I didn’t design this):
SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'; +----------+ | COUNT(*) | +----------+ | 3224022 | +----------+ 1 row in set (1 min 0.16 sec)
Comparing that to a full count:
select count(*) from change_event; +----------+ | count(*) | +----------+ | 6069102 | +----------+ 1 row in set (4.21 sec)
The explain statement doesn’t help me here:
explain SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: me type: range possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: NULL rows: 4120213 Extra: Using where; Using index 1 row in set (0.00 sec)
OK, it still thinks it needs roughly 4 million entries to count, but I could count lines in a file faster than that! I don’t understand why MySQL is taking this long.
Here’s the table definition:
CREATE TABLE `change_event` ( `change_event_id` bigint(20) NOT NULL default '0', `timestamp` datetime NOT NULL, `change_type` enum('create','update','delete','noop') default NULL, `changed_object_type` enum('Brand','Broadcast','Episode','OnDemand') NOT NULL, `changed_object_id` varchar(255) default NULL, `changed_object_modified` datetime NOT NULL default '1000-01-01 00:00:00', `modified` datetime NOT NULL default '1000-01-01 00:00:00', `created` datetime NOT NULL default '1000-01-01 00:00:00', `pid` char(15) default NULL, `episode_pid` char(15) default NULL, `import_id` int(11) NOT NULL, `status` enum('success','failure') NOT NULL, `xml_diff` text, `node_digest` char(32) default NULL, PRIMARY KEY (`change_event_id`), KEY `idx_change_events_changed_object_id` (`changed_object_id`), KEY `idx_change_events_episode_pid` (`episode_pid`), KEY `fk_import_id` (`import_id`), KEY `idx_change_event_timestamp_ce_id` (`timestamp`,`change_event_id`), KEY `idx_change_event_status` (`status`), CONSTRAINT `fk_change_event_import` FOREIGN KEY (`import_id`) REFERENCES `import` (`import_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
Version:
$ mysql --version mysql Ver 14.12 Distrib 5.0.37, for pc-solaris2.8 (i386) using readline 5.0
Is there something obvious I’m missing? (Yes, I’ve already tried ‘SELECT COUNT(change_event_id)’, but there’s no performance difference).
InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages. In order to do a range scan you still have to scan through all of the potentially wide rows in data pages; note that this table contains a TEXT column.
Two things I would try:
optimize table. This will ensure that the data pages are physically stored in sorted order. This could conceivably speed up a range scan on a clustered primary key.(you also probably want to make the change_event_id column bigint unsigned if it’s incrementing from zero)