I’ve got a very large table (~100Million Records) in MySQL that contains information about files. One of the pieces of information is the modified date of each file.
I need to write a query that will count the number of files that fit into specified date ranges. To do that I made a small table that specifies these ranges (all in days) and looks like this:
DateRanges
range_id range_name range_start range_end
1 0-90 0 90
2 91-180 91 180
3 181-365 181 365
4 366-1095 366 1095
5 1096+ 1096 999999999
And wrote a query that looks like this:
SELECT r.range_name, sum(IF((DATEDIFF(CURDATE(),t.file_last_access) > r.range_start and DATEDIFF(CURDATE(),t.file_last_access) < r.range_end),1,0)) as FileCount
FROM `DateRanges` r, `HugeFileTable` t
GROUP BY r.range_name
However, quite predictably, this query takes forever to run. I think that is because I am asking MySQL to go through the HugeFileTable 5 times, each time performing the DATEDIFF() calculation on each file.
What I want to do instead is to go through the HugeFileTable record by record only once, and for each file increment the count in the appropriate range_name running total. I can’t figure out how to do that….
Can anyone help out with this?
Thanks.
EDIT: MySQL Version: 5.0.45, Tables are MyISAM
EDIT2: Here’s the descibe that was asked for in the comments
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL NULL NULL NULL NULL 5 Using temporary; Using filesort
1 SIMPLE t ALL NULL NULL NULL NULL 96506321
First, create an index on
HugeFileTable.file_last_access.Then try the following query:
Here’s the
EXPLAINplan that I got when I tried this query on MySQL 5.0.75 (edited down for brevity):It’s still not going to perform very well. By using
GROUP BY, the query incurs a temporary table, which may be expensive. Not much you can do about that.But at least this query eliminates the Cartesian product that you had in your original query.
update: Here’s another query that uses a correlated subquery but I have eliminated the
GROUP BY.The
EXPLAINplan shows no temporary table or filesort (at least with the trivial amount of rows I have in my test tables):Try this query on your data set and see if it performs better.