There is a query that takes too long on a 250,000 rows table. I need to speed it up:
create table occurrence (
occurrence_id int(11) primary key auto_increment,
client_id varchar(16) not null,
occurrence_cod varchar(50) not null,
entry_date datetime not null,
zone varchar(8) null default null
)
;
insert into occurrence (client_id, occurrence_cod, entry_date, zone)
values
('1116', 'E401', '2011-03-28 18:44', '004'),
('1116', 'R401', '2011-03-28 17:44', '004'),
('1116', 'E401', '2011-03-28 16:44', '004'),
('1338', 'R401', '2011-03-28 14:32', '001')
;
select client_id, occurrence_cod, entry_date, zone
from occurrence o
where
occurrence_cod = 'E401'
and
entry_date = (
select max(entry_date)
from occurrence
where client_id = o.client_id
)
;
+-----------+----------------+---------------------+------+
| client_id | occurrence_cod | entry_date | zone |
+-----------+----------------+---------------------+------+
| 1116 | E401 | 2011-03-28 16:44:00 | 004 |
+-----------+----------------+---------------------+------+
1 row in set (0.00 sec)
The table structure is from a commercial application and can not be altered.
What would be the best index(es) to optimize it? Or a better query?
EDIT:
It is the last occurrence of the E401 code for each client and only if the last occurrence is that code.
The ideal indexes for such a query would be:
Nevertheless those indexes can be simplified if it happens that data have some characteristics. This will save file space, and also time when data are updated (insert/delete/update).
If there is rarely more than one “occurence” record for each [client_id], then index #1 can be only [client_id].
By the same way, if there is rarely more than one “occurence” record for each [occurence_cod], then index #1 can be only [occurence_cod].
It may be more useful to turn index #2 into [entry_date] + [occurence_cod]. This will enable you to use the index for criteria that are only on [entry_date].
Regards,