I have the following query:
SELECT IF(dissolution_date IS NULL,
YEAR(CURDATE()) - YEAR(incorporation_date),
YEAR(dissolution_date) - YEAR(incorporation_date)) as length,
COUNT(DISTINCT(id_company)) as count
FROM company
WHERE incorporation_date IS NOT NULL
GROUP BY length
ORDER BY length ASC
Given that I have the dissolution date (or a replacement for it) and the incorporation date it seems redundant to add an additional column to the table which stores the difference between the two dates (especially as if a company hasn’t dissolved, the dissolution date would need updating every day year).
The EXPLAIN output is as follows:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----|-------------|---------|------|--------------------|------|---------|------|----------------------------------------
1 | SIMPLE | company | ALL | incorporation_date | NULL | NULL | NULL | 9128995 | Using where; Using filesort
and currently the query takes 10’s of seconds, whereas similar queries on non-calculated columns take milliseconds, which is what I’d like to achieve here.
Is it possible to group and order by length without using filesort and without adding the length column to the original table? Or should I just add the length column in, in which case what would be the best way of updating the dissolution_date every day year?
What you are referring to in Database theory is called a derived attribute. You do not want to implement such an attribute because although it will work fast, it’s not very accurate. So we dont create a column for such an attribute but rather calculate it when needed.
There is no need for the file sorting, it’s what the order by clause is doing.
Assuming that your query is correct, I would suggest creating a B Tree index on the company table using search key attributes (dissolution_date, incorporation_date) since they appear to be heavily used in your query.
Could you give an idea of the company table?