I have a PostgreSQL database that has multiple entries for the objectid, on multiple devicenames, but there is a unique timestamp for each entry. The table looks something like this:
address | devicename | objectid | timestamp
--------+------------+---------------+------------------------------
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-03 15:37:09.06065+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-03 15:48:33.93128+00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-05 16:01:59.266779+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-05 16:13:46.843113+00
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-06 01:11:45.853361+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-06 01:23:21.204324+00
I want to delete all but the oldest entry for each odjectid and devicename. In this case I want to delete all but:
1.1.1.1 | device1 | vs_hub.ch1_25 | 2012-10-02 17:36:41.011629+00
1.1.1.2 | device2 | vs_hub.ch1_25 | 2012-10-02 17:48:01.755559+00
Is there a way do this? Or is it possible to select the oldest entries for both “objectid and devicename” into a temp table?
To distill the described result, this would probably simplest and fastest:
Details and explanation in this related answer.
From your sample data, I conclude that you are going to delete large portions of the original table. It is probably faster to just
TRUNCATEthe table (orDROP& recreate, since you should add a surrogate pk column anyway) and write the remaining rows to it. This also provides you with a pristine table, implicitly clustered (ordered) the way it’s best for your queries and save the work thatVACUUMwould have to do otherwise. And it’s probably still faster overall:I would also strongly advise to add a surrogate primary key to your table, preferably a
serialcolumn.Do it all within a transaction to make sure you are not going to fail half way through.
This is fast as long as your setting for
temp_buffersis big enough to hold the temporary table. Else the system will start swapping data to disk and performance takes a dive. You can settemp_buffersjust for the current session like this:So you don’t waste RAM that you don’t normally need for
temp_buffers. Has to be set before the first use of any temporary objects in the session. More information in this related answer.Also, as the
INSERTfollows aTRUNCATEinside a transaction, it will be easy on the Write Ahead Log – improving performance.Consider
CREATE TABLE ASfor the alternative route:The only downside: You need an exclusive lock on the table. This may be a problem in databases with heavy concurrent load.
Finally, never use
timestampas column name. It’s a reserved word in every SQL standard and a type name in PostgreSQL. I usedtsinstead.