I have a collection content that has four columns; id, timestamp, locationID, and authorID. Here is an example of my data; in production, this is tens of millions of rows in length.
id timestamp locationID authorID
1 2012-03-01 11:52:00 1 1
2 2012-03-16 19:56:00 1 2
3 2012-04-02 11:26:00 2 1
4 2012-04-22 11:52:00 2 3
5 2012-05-19 09:48:00 2 2
6 2012-05-30 07:12:00 2 1
7 2012-06-04 19:17:00 1 2
I’d like to collect the list of authorIDs whose most recent content (ordered by timestamp) matched a specific locationID.
The correct values for a query of locationID = 2 would be: [ 1, 3 ], as authorID 1 and 3 were most recently ‘seen’ at locationID = 2, while authorID 2’s most recent content was at locationID 1.
I can certainly execute one query per authorID, but on production the authorID array has a length >100,000. This seems terribly inefficient (especially when each ‘subquery’ would be hitting this multi-million row content collection), and I’m looking for a better way to emerge this data from my dataset, ideally fast enough to be executed on a page render.
Something like this? This is from SQL Server, but I think it should work in mySQL as well.
For locationId = 2, it returns 1 and 3; and for locationId = 1, it returns 2
Per JW (thanks!), the correct mySql approach: