I’m looking for a way to output a selected related record for each record in a table in MySQL. I’ll explain further…
I have 2 tables currencies and exchange_rates. The tables are joined by a currency_code field and each currency record has multiple related exchange rate records, each exchange rate record represents a different day. So there is a 1:many relationship between currencies and exchange_rates.
I want to retrieve a full record from the exchange_rates table for each currency but with the ability to define specific criteria as to which related record to select. Not just the most recent exchange_rate for each currency but maybe the most recent exchange_rates record for each currency that has the field criteria_x=NULL.
It’s a shame that you can’t use LIMIT within a derived table otherwise something like this would be a neat and readable solution…
SELECT `currencies`.`currency_code`, `currencies`.`country`, `exchange_rates`.`id`,
FROM_UNIXTIME(`exchange_rates`.`datestamp`), `rate`
FROM `currencies`
INNER JOIN (
SELECT `id`, `currency_code`, `invoice_id`, `datestamp`, `rate`
FROM `exchange_rates`
WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A'
ORDER BY `datestamp` DESC
LIMIT 0, 1
) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code`
ORDER BY `currencies`.`country`
The LIMIT clause is applied to the parent query not the derived table.
This is the only way I’ve found to do this…
SELECT `currencies`.`currency_code`, `currencies`.`country`,
FROM_UNIXTIME( SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 1), '-', -1)) AS `datestamp`,
SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 2), '-', -1) AS `id`,
SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 3), '-', -1) AS `rate`
FROM `currencies`
INNER JOIN (
SELECT `currency_code`, MAX(CONCAT_WS('-', `datestamp`, `id`, `rate`)) AS `concat`
FROM `exchange_rates`
WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A'
GROUP BY `exchange_rates`.`currency_code`
) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code`
ORDER BY `currencies`.`country`
So concatenating a bunch of fields together and running a MAX() on it to get my sort order within the group, then parsing those fields out in the parent query with SUBSTRING_INDEX(). The problem is that this method only works when I can use a MIN() or MAX() on the concatenated field. It wouldn’t be ideal if I wanted to sort a string or sort by multiple criteria but limit to a single record.
Also it causes me physical pain to have to resort to horrible string manipulation to get the data I want from a relational database — there has to be a better way!
Anyone got any suggestions of a better method?
There are a few general issues to discuss (briefly) before trying to provide an answer.
Your first query is:
criteria_x = NULLnotation; that should be written ascriteria_x IS NULL. MySQL may allow it; as long as you are aware that it is non-standard, it is OK for you to use.LIKE 'A'is not sensible if it contains no metacharacters (%or_in standard SQL). You’d be better off with simple equality:= 'A'.Your question says:
So, you want to select the most recent exchange rate record for each currency that meets the required other criteria. We can assume that there is a unique constraint on the combination of
currency_codeanddatestampin the exchange rate table; this means that there will always be at most one matching row. You’ve not specified what should be shown if there is no matching row; an inner join will simply not list that currency, of course.With SQL queries, I usually build and test the overall query in steps, adding extra material to the previously developed queries that are known to work and produce the right output. If it is simple and/or I’ve collected too much hubris, I’ll try a complex query first, but when (nemesis) it doesn’t work, then I go back to the build and test process. Think of it as Test Driven (Query) Development.
Stage 1: Exchange rate records that match specified criteria
Stage 2: Most recent exchange rate time for each currency that matches specified criteria
Stage 3: Exchange rate record for most recent exchange rate time for each currency that matches specified criteria
Stage 4: Currency information and exchange rate record for most recent exchange rate time for each currency that matches specified criteria
This requires the joining the currencies table with the output of the previous query:
Except that Oracle only allows ‘
) r‘ instead of ‘) AS r‘ for table aliases and the use ofFROM_UNIXTIME(), I believe that should work correctly with the current version of almost any SQL DBMS you care to mention.Since the invoice ID is not returned in the final query, we can remove that from the select-list of the middle query. A good optimizer might do that automatically.
If you want to see the currency information even if there is no exchange rate that matches the criteria, then you need to change the JOIN in the outermost query to a LEFT JOIN (aka LEFT OUTER JOIN). If you only want to see a subset of the currencies, you can apply that filter at either the last (outermost) query stage, or (if the filter is based on information available in the exchange rate table, such as the currency code) at either the innermost sub-query (most efficient) or the middle sub-query (not so efficient unless the optimizer realizes it can push the filter down to the innermost sub-query).
Correctness is usually the primary criterion; performance is a secondary criterion. However, performance was mentioned in the question. The first rule is to measure the ‘simple’ query shown here. Only if that proves too slow do you need to worry further. When you do need to worry, you examine the query plan to see if there is, for example, a crucial index missing. Only if the query still isn’t fast enough do you start trying to resort to other tricks. Those tricks tend to be very specific to a particular DBMS. For example, there might be optimizer hints that you can use to make the DBMS process the query differently.