I have two tables in my SQL
For example Table1 – ItemPrice:
DATETIME | ITEM | PRICE
2011-08-28 | ABC 123
2011-09-01 | ABC 125
2011-09-02 | ABC 124
2011-09-03 | ABC 127
2011-09-04 | ABC 126
Table2 – DayScore:
DATETIME | ITEM | SCORE
2011-08-28 | ABC 1
2011-08-29 | ABC 8
2011-09-01 | ABC 4
2011-09-02 | ABC 2
2011-09-03 | ABC 7
2011-09-04 | ABC 3
I want to write a query, which given a item ID (e.g. ABC), will return the price at that date from ItemPrice (of there is no price for that date then the query should not return anything). If a valid price is found for the query date, the query should return (in 9 columns)
- the price of the item from
ItemPricefor the past three days (i.e. the most recent 3 prices before the date queried). - In the next three columns it should return, from
DayScore, the matching score for those 3 dates selected from ItemPrice. - Finally the dates (t-1 to t-3) selected
In otherwords the results for this query looking at just date=’2011-09-03′ as an example for item=’abc’ would return:
DATE | ITEM | PRICE | SCR | PRC_t-1 | PRC_t-2 | PRC_t-3 | SCR_t-1 | SCR_t-2 | SCR_t-3 | DATE_t-1 | DATE_t-2 | DATE_t-3
2011-09-03| ABC | 127 | 7 | 124 | 125 | 123 | 2 | 4 | 1 | 2011-09-02| 2011-09-01| 2011-08-28
....
Etc for each date that appears in ItemPrice table.
What is the neatest and most efficient way to run this query (as its something that will be run over many millions of rows)?
Cheers!
Pretty no but it does produce the results. You could probably get rid of some subselects and make it a bit less sql but I tried to build it up in steps so you can deduct what it is doing.
The core part is this select:
This returns a table with the dates (now, t-1, t-2, t-3). From there is is simple joining with price and score for each of those dates. The whole things including testdata the becomes this bulk of sql
I’m curious about your explain plan when you do this on 1M rows 🙂 It might not even be that horrible if you have the right indexes which you probably do.