Using MYSQL I would like to refactor the following SELECT statement to return the entire record containing the newest invoice_date:
> SELECT id, invoice, invoice_date
FROM invoice_items
WHERE lot = 1047
id invoice_id invoice_date
-----------------------------------
3235 1047 2009-12-15 11:40:00
3295 1047 2009-12-15 16:00:00
3311 1047 2009-12-15 09:30:00
3340 1047 2009-12-15 13:50:00
Using the MAX() aggregate function and the GROUP BY clause gets me part of the way there:
> SELECT id, invoice_id, max(invoice_date)
FROM invoice_items
WHERE invoice_id = 1047
GROUP BY invoice_id
id invoice_id invoice_date
-----------------------------------
3235 1047 2009-12-15 16:00:00
Notice that the query appears to get the MAX(invoice_date) correctly, but the id returned (3235) is not the id of the record containing the MAX(invoice_date) (3295) it is the id of the first record in the initial query.
How do I refactor this query to give me the the entire record that contains the MAX(invoice_date)?
The solution must use the GROUP BY clause, because I need to get newest invoice_date for each invoice.
This is the often-repeated “greatest-n-per-group” problem.
Here’s how I would solve it in MySQL:
Explanation: for each row
i1, try to find a rowi2with the sameinvoice_idand a greater date. If none are found (i.e.i2is all nulls because of the outer join), theni1must be the row with the greatest date for itsinvoice_id.This solution using join tends to work better for MySQL, which is weak when optimizing both
GROUP BYand subqueries.