How can you join between a table with a sparse number of dates and another table with an exhaustive number of dates such that the gaps between the sparse dates take the values of the previous sparse date?
Illustrative example:
PRICE table (sparse dates): date itemid price 2008-12-04 1 $1 2008-12-11 1 $3 2008-12-15 1 $7 VOLUME table (exhaustive dates): date itemid volume_amt 2008-12-04 1 12345 2008-12-05 1 23456 2008-12-08 1 34567 2008-12-09 1 ... 2008-12-10 1 2008-12-11 1 2008-12-12 1 2008-12-15 1 2008-12-16 1 2008-12-17 1 2008-12-18 1
Desired result:
date price volume_amt 2008-12-04 $1 12345 2008-12-05 $1 23456 2008-12-08 $1 34567 2008-12-09 $1 ... 2008-12-10 $1 2008-12-11 $3 2008-12-12 $3 2008-12-15 $7 2008-12-16 $7 2008-12-17 $7 2008-12-18 $7
Update:
A couple people have suggested a correlated subquery that accomplishes the desired result. (Correlated subquery = a subquery that contains a reference to the outer query.)
This will work; however, I should have noted that the platform I’m using is MySQL, for which correlated subqueries are poorly optimized. Any way to do it without using a correlated subquery?
This isn’t as simple as a single LEFT OUTER JOIN to the sparse table, because you want the NULLs left by the outer join to be filled with the most recent price.
This query matches Volume to all rows in Price that are earlier, and then uses another join to make sure we find only the most recent price.
I tested this on MySQL 5.0.51. It uses neither correlated subqueries nor group by.
edit: Updated the query to match to item_id as well as date. This seems to work too. I created an index on
(date)and an index on(date, item_id)and the EXPLAIN plan was identical. An index on(item_id, date)may be better in this case. Here’s the EXPLAIN output for that:But I have a very small data set, and the optimization may depend on larger data sets. You should experiment, analyzing the optimization using a larger data set.
edit: I pasted the wrong EXPLAIN output before. The one above is corrected, and shows better use of the
(item_id, date)index.