I have table t:
id, timestamp
There are multiple id values, and multiple rows might share a given id.
I want to select the most recent row for each id, before x date, only if id is not found after x date, and id is also not found in table y.
I can select all before x date, in this example :date=5 :
SELECT * FROM t WHERE timestamp < :date
I attempted to get most recent id only, not getting most recent–but returning 1 row per id:
SELECT * FROM t WHERE timestamp < :date GROUP BY id ORDER BY timestamp DESC
I’m concerned GROUP BY will slow things with lots of data.
Here is some sample db data:
CREATE TABLE IF NOT EXISTS `t` (
`id` int(2) NOT NULL,
`timestamp` int(2) NOT NULL
)
INSERT INTO `t` (`id`, `timestamp`) VALUES
(1, 1),
(1, 4),
(2, 3),
(2, 1),
(2, 6),
(3, 4),
(3, 2);
CREATE TABLE IF NOT EXISTS `y` (
`id` int(2) NOT NULL,
`timestamp` int(2) NOT NULL
)
INSERT INTO `y` (`id`, `timestamp`) VALUES
(3, 1);
Looking to return row (1,4) only…
Thanks!
You need to select with a MAX to get the latest time (rather than sorting) do a LEFT JOIN to compare data in the other table, and a HAVING as an argument to GROUP BY to only select the appropriate data.
When you do a GROUP BY you can select with aggregate functions. Here MAX returns the maximum value for that column in all the rows in the group (since you are grouping by id, this will return the maximum timestamp for each id). But you only want to select elements that don’t have a timestamp after :date — that’s where HAVING comes in (HAVING is essentially a WHERE for GROUP BY aggregates). Finally, you don’t want to select elements that are in table y. So you LEFT JOIN table y in, and only select rows where the corresponding row in table y doesn’t exist (i.e. that id doesn’t exist in table y); you do this using a regular WHERE.
UPDATE: To make this efficient, all you have to do is add indexes to the appropriate columns. In this case, you would want to add indexes for
t.id,t.timestamp, andy.id. See dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html.