I have a SUPER slow query, which I posted here: http://pastebin.com/E5sdRi7e. When I did an EXPLAIN, I got the following:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5 Using filesort
2 DERIVED Workflow ALL PRIMARY NULL NULL NULL 9 Using temporary; Using filesort
2 DERIVED <derived3> ALL NULL NULL NULL NULL 141 Using where; Using join buffer
2 DERIVED DataSource ALL PRIMARY NULL NULL NULL 1310 Using where; Using join buffer
2 DERIVED <derived4> ALL NULL NULL NULL NULL 1310 Using where; Using join buffer
2 DERIVED User eq_ref PRIMARY PRIMARY 4 LatestDataSourceActivityLog.UserId 1
4 DERIVED t1 ALL NULL NULL NULL NULL 5400 Using where; Using temporary; Using filesort
5 DEPENDENT SUBQUERY t2 ref DataSourceId DataSourceId 4 companyname_db.t1.DataSourceId 4
3 DERIVED DataSource range PRIMARY PRIMARY 4 NULL 142 Using where
What does the above table tell me? Does it help me identify which fields should be indexed?
Any help is greatly appreciated.
Query
SELECT WrappedData.*
FROM (SELECT ParentLeafNodeDataSource.Id,
LatestDataSourceActivityLog.UserId,
DataSource.Status AS StatusCode,
( CASE
WHEN User.Name IS NULL THEN 'CompanyName'
ELSE User.Name
END ) AS `Username`,
Workflow.Name AS WorkflowName,
LatestDataSourceActivityLog.Timestamp
FROM DataSource,
Workflow,
(SELECT *
FROM DataSource
WHERE DataSource.Id IN ( 0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 16, 21,
22, 23, 24, 25,
26, 27, 28, 29,
30, 31, 32, 33,
34, 35, 36, 37,
38, 39, 40, 41,
42, 43, 44, 45,
46, 47, 48, 49,
50, 51, 52, 53,
54, 55, 56, 57,
58, 59, 60, 61,
62, 63, 64, 65,
66, 67, 68, 69,
70, 71, 72, 73,
74, 75, 76, 77,
78, 79, 80, 81,
83, 84, 85, 86,
87, 88, 89, 90,
91, 92, 93, 94,
95, 96, 97, 98,
99, 100, 101, 102,
103, 104, 105, 106,
107, 108, 109, 110,
111, 112, 113, 114,
115, 116, 117, 118,
119, 120, 142, 1293,
1294, 1295, 1296, 1297,
1298, 1299, 143, 1300,
1301, 1302, 1303, 1304,
1305, 1306, 144, 146,
145, 1307, 1308, 1309,
1310, 147, 149, 148,
150, 151 )) AS ParentLeafNodeDataSource,
(SELECT t1.*
FROM DataSourceActivityLog AS t1
WHERE Timestamp = (SELECT Max(t2.Timestamp)
FROM DataSourceActivityLog AS t2
WHERE t1.DataSourceId = t2.DataSourceId)
GROUP BY t1.DataSourceId) AS LatestDataSourceActivityLog
LEFT JOIN User
ON User.Id = LatestDataSourceActivityLog.UserId
WHERE ParentLeafNodeDataSource.Status = '203'
OR ParentLeafNodeDataSource.Status = '204'
AND Workflow.Id = ParentLeafNodeDataSource.WorkflowId
AND LatestDataSourceActivityLog.DataSourceId = ParentLeafNodeDataSource.Id
AND DataSource.Id = LatestDataSourceActivityLog.DataSourceId
AND LatestDataSourceActivityLog.UserId = 1
GROUP BY ParentLeafNodeDataSource.Id) AS WrappedData
ORDER BY WrappedData.`Timestamp` DESC
It’s very difficult to say conclusively, but here are a couple refactoring things.
On performance, the first thing to look at are GROUP functions.
Which can eliminate the use of MAX entirely
Probably not a big performance issue, but here you can use IFNULL or COALESCE instead of a CASE.
Instead
In terms of indexes, they increase SELECT performance by making lookups easier, but they slow down write operations as the indexes have to be updated as well. If your application isn’t write-heavy, you should be indexing commonly searched columns, particularly in large tables.
In this query, it looks like you’d benefit by adding an index to DataSourceId, but I can’t test if there’s any gain. The primary keys will already be indexed.