So, i have a table with roughly 1.5 million rows in it, looking somewhat like this:
name | time | data1 | data2
--------------------------------------
93-15 | 1337348782 | 11 | 60.791
92-02 | 1337348783 | 11 | 62.584
92-02 | 1337348056 | 11 | 63.281
93-15 | 1337348068 | 8 | 65.849
92-02 | 1337348117 | 11 | 63.271
93-15 | 1337348129 | 8 | 65.849
92-02 | 1337348176 | 10 | 63.258
93-15 | 1337348188 | 8 | 65.849
92-02 | 1337348238 | 10 | 63.245
93-15 | 1337348248 | 8 | 65.849
…these correspond to historical status updates from something that needs to be monitored. Now, what i would like to do is to find the current status if each unit.
It wasn’t hard finding similar questions here on stackoverflow, and extrapolating from the findingsat, i came up with this query:
SELECT * FROM vehicles v
JOIN ( SELECT MAX(time) as max, name
FROM vehicles
GROUP BY name)
m_v
ON (v.time = m_v.max AND v.name = m_v.name);
but seeing as i have roughly 1.5 million rows (and counting), is there a different approach that allows for a faster query?
A covering index on
(name, time)would be helpful too.EDIT: Notes on how it work, etc.
PostgreSQL has what are know as windowing or analytical functions. These generally take the form
some_function() OVER (PARTITION BY some_fields ORDER BY some_fields).In this case I used
ROW_NUMBER() OVER (PARTITION BY name ORDER BY time DESC).ROW_NUMBER()creates a unique row number for a set of data.1 to nfornrecords.PARTITION BY namemeans that this function is applied independently to different names. Eachnameis it’s own group/window/partition, and the results ofROW_NUMBER()start over from1again for each group/window/partition.ORDER BY time DESCtakes all the records in one group/window/partition and orders them by thetimefield, with the highest value first, before theROW_NUMBER()function is applied.For your example data, therefore, you get this…
Because the ordering is
time DESC, the highest valuedtimefield, in eachnamegroup/window/partition, will always have arow_numberof1.Having an index on
(name, time)makes it much easier for the optimiser by ensuring the data is in a friendly order. This means thatROW_NUMBER()isn’t actually applied to all the records; as soon as it finds the highest valuedtimerecord, and assignsROW_NUMBER() = 1, it knows it can stop and move on to the nextname.