I’m running a MySQL server (5.5) which has a large table (about 10M records on it). This table is some kind of log which has a primary key on 2 columns:
id <- integer,
date <- datetime
The application that connects to this database is sending a query that reads something like:
SELECT * FROM bigtable
INNER JOIN other_table
ON ....
WHERE UNIX_TIMESTAMP(date) BETWEEN #somevalue# AND #somevalue2#;
I found that this query was taking so much time to execute. I know that some functions can prevent MySQL from using indexes and make it perform a full table scan instead.
The question:
Is there a perfomance hit by using the function UNIX_TIMESTAMP on the column of the primary key as shown instead of “… WHERE date BETWEEN ‘2012:01:01 00:00:00’ AND ‘2012:02:01 00:00:00’ “ ?
The query:
SELECT r.f_registro, r.latitud, r.longitud, r.velocidad, r.status, r.odometro, r.heading, r.sensor, a.nombre FROM registros r INNER JOIN activos a ON a.id_tracker = r.id_tracker WHERE a.id_activo = 2366 AND r.satelites > '3' AND UNIX_TIMESTAMP(r.f_registro) BETWEEN 1342159200 AND 1342760400 ORDER BY r.f_registro
It takes several seconds or even minutes to execute!
Running explain returns:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,a,const,PRIMARY,PRIMARY,4,const,1,"Using filesort"
1,SIMPLE,r,range,"id_tracker,satelites",satelites,4,NULL,1,"Using index condition; Using where"
You are correct that using a function on the date column prevents MySQL from utilizing the index on the column.
Instead, calculate the range into two date constants and use BETWEEN.
Also, note that you’ve not indicated that there is an index on the date column. Indexes are left most prefixed, so the compound index that starts with id cannot be used for a query that only asks for date.
The ON part of the query (which you’ve excluded) may be just as important in the performance of the query, and you should evaluate whether that is able to use indexes also.