I have the following MySQL query:
SELECT value_1, (
SELECT value_4
FROM table_1
WHERE value_3 < value_1
ORDER BY value_3 DESC
LIMIT 1
)
AS result_value
FROM table_2
WHERE value_1 BETWEEN 1325372400000 AND 1328050800000
ORDER BY value_1
Which returns 32 results and takes 6.6 seconds to run. The idea is to get the entries from table_1 where value_3 is “closest” to value_1 in table_2.
The two queries the query is comprised of, namely
SELECT value_1
AS result_value
FROM table_2
WHERE value_1 BETWEEN 1325372400000 AND 1328050800000
ORDER BY value_1
and (for example)
SELECT value_4
FROM table_1
WHERE value_3 < 1328050800000
ORDER BY value_3 DESC
LIMIT 1
each take 0.03 seconds to run. Computing the accumulated time for 32 result sets, the compound query should take no longer than 1 second, or probably even less (because I/O overhead for individual queries is not accounted for). Yet at 6.6 seconds, it takes much longer.
Why is that, how can I optimize it? Or is there another/superior way to reach my goal?
Update:
Table definitions:
table_1 (MyISAM, 700000 entries):
'id', 'int(10) unsigned', 'NO', 'PRI', NULL, 'auto_increment'
'value_3', 'bigint(20) unsigned', 'NO', 'UNI', '0', ''
'value_4', 'bigint(20) unsigned', 'NO', '', '0', ''
table_2 (MyISAM, 4000 entries):
'value_1', 'bigint(20) unsigned', 'NO', 'PRI', NULL, ''
EXPLAIN [query]:
'1', 'PRIMARY', 'table_2', 'range', 'PRIMARY,value_3_UNIQUE', 'PRIMARY', '8', NULL, '32', 'Using where; Using index'
'2', 'DEPENDENT SUBQUERY', 'table_1', 'index', 'value_3,value_3_value_4', 'value_3', '8', NULL, '1', 'Using where'
Use a bit of mysql kung fu:
The "kung fu" is that with mysql when the columns you’re not grouping by are not aggregated (eg
SUM()etc), you get the first row encountered for each group. If you use this technique on an ordered result set, you can get the value you’re after.Not only is this easier to code, but you’ll notice that just one pass is made over the tables (not one query per row as you tried). It should perform well.
Edit:
There has been conjecture by some commenters that this special form of
GROUP BYis "indeterminate" and/or "not officially supported" etc. The documentation states that the row selected in "indeterminate", however I have never seen nor heard of mysql selecting any row other than the first encountered and using anorder byon the inner select is used and relied upon in countless production queries.FWIW, I am happy to recommend this approach as being "reliable" and production-worthy.