I wrote a Perl script that it makes some SQL queries in a table with more than 140000 rows and expanding.
I want to compare dates and get some rows, but I realized that just by changing one SQL query, I get so much different execution speeds.
Take a look at the following test results performing 100 $sql queries.
The only line I change in the script between different executions is the $sql line.
I ran the tests many times and I always get similar results, so I guess that it is not related to caching issues.
my $sql = "SELECT `mem_used`, `swap_used`, `mem_total`
FROM `$config{db}{data_table}`
WHERE `host_id` = $host_id
AND date >= '$date'
AND TIMESTAMPDIFF( MINUTE , `date`, '$date' ) <= $interval;"; # VERY SLOW
time ./data_smoothing.pl
real 1m28.818s
user 1m6.516s
sys 0m0.256s
my $sql = "SELECT `mem_used`, `swap_used`, `mem_total`
FROM `$config{db}{data_table}`
WHERE `host_id` = $host_id
AND date >= '$date'
AND (UNIX_TIMESTAMP(`date`) - UNIX_TIMESTAMP('$date')) <= ($interval * 60);"; #SLOW
$ time ./data_smoothing.pl
real 0m10.005s
user 0m0.108s
sys 0m0.028s
my $sql = "SELECT `mem_used`, `swap_used`, `mem_total`
FROM `$config{db}{data_table}`
WHERE `host_id` = $host_id
AND (`date` BETWEEN '$date'
AND DATE_ADD('$date', INTERVAL $interval MINUTE));"; #FAST
$ time ./data_smoothing.pl
real 0m0.190s
user 0m0.084s
sys 0m0.016s
How the table is created (taken from a mysqldump)
CREATE TABLE `data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host_id` smallint(6) NOT NULL,
`date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`mem_total` double(10,3) DEFAULT NULL,
`mem_used` double(10,3) DEFAULT NULL,
`swap_total` double(10,3) DEFAULT NULL,
`swap_used` double(10,3) DEFAULT NULL,
`CPU_count` smallint(6) DEFAULT NULL,
`load_avg_1` float DEFAULT NULL,
`load_avg_5` float DEFAULT NULL,
`load_avg_15` float DEFAULT NULL,
`uptime` double(10,3) DEFAULT NULL,
`cpuIdlingTime` double(10,3) DEFAULT NULL,
`rxBytesTotal` bigint(20) DEFAULT NULL,
`txBytesTotal` bigint(20) DEFAULT NULL,
`rxPacketsTotal` bigint(20) DEFAULT NULL,
`txPacketsTotal` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`,`host_id`),
KEY `fk_data_hosts` (`host_id`),
KEY `date_memtot_hosts` (`date`,`mem_total`,`host_id`),
CONSTRAINT `fk_data_hosts` FOREIGN KEY (`host_id`) REFERENCES `hosts` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=145300 DEFAULT CHARSET=utf8;
The last one is fastest because your comparison lends itself well to indexing. The others, not so much.
See, when you call a function (or do just about anything else) with your column’s value before you test it, you make it nearly impossible to use an index to quickly find matching rows. The engine has to basically go through the whole table, grabbing a date, doing some math with it, and then checking whether the condition is true.
Meanwhile, if you just say
BETWEEN this_value AND that_value, MySQL doesn’t have to do much at all — it can consult the index and just find the two endpoints of the range, which is much faster.The call to
DATE_ADD('$date', INTERVAL $interval MINUTE)doesn’t have much effect on the run time, cause MySQL is generally smart enough to cache values it knows won’t change so it doesn’t have to calculate them again each time.As for the reason for the difference between the first two, i couldn’t tell you. Perhaps
TIMESTAMPDIFFis just that slow. Perhaps the conversion and math are much simpler with timestamps, particularly consideringUNIX_TIMESTAMP('$date')doesn’t need recalculating each time. But all that’s really just guessing.