I have a table briefly structured like this:
tn( id integer NOT NULL primary key DEFAULT nextval('tn_sequence'),
create_dt TIMESTAMP NOT NULL DEFAULT NOW(),
...............
deleted boolean );
create_dt is the timestamp when the row is inserted into the database.
deleted indicates that the row is or no longer useful.
And I have the following queries:
select * from tn where create_dt > ( NOW() - interval '150 seconds ) and deleted = FALSE;
select * from tn where create_dt < ( NOW() - interval '150 seconds ) and deleted = FALSE;
My question is how these query will slow down when the number of rows increase? For instance, when the number of rows exceeds 10K, 20K, or 100K, will it make a big impact on the speed? Is there any way I can optimize these queries? Note that every 5 seconds I will turn the column ‘deleted’ of rows which are older than 150 seconds into ‘TRUE’.
The effect of table growth on performance will depend on the query plan chosen, available indexes, the selectivity of the query, and lots of other factors.
EXPLAIN ANALYZEon the query might help. In short, if your query only selects a few rows and can use a simple b-tree index then it won’t usually slow down tons, only a little as the index grows. On the other hand queries using complex non-indexed conditions or returning lots of rows could perform very badly indeed.Your issue appears to mirror that in the question How should we handle rows which won’t be queried once they are old in PostgreSQL?
The advice given there should apply:
WHERE (not deleted); orFor example, you might:
This includes only rows where
deleted = 'f'(assumingdeletedis `not null) in the index. This isn’t the same as having them gone from the table completely.deleted='t'rows must still be scanned; anddeleted = 't'rows weren’t there because any given heap page is likely to contain a mix ofdeleted = 't'anddeleted = 'f'rows.You can reduce the impact of the latter by
CLUSTERing on an index that includesdeleted. Again, this will have no effect on sequential scans. To help with sequential scans you would have to partition the table ondeleted.Pg 9.2’s index only scans should (I think, haven’t tested) use the partial index. When an index only scan is possible the partial index should be as fast as an index on a table containing only the
deleted = 'f'rows.Note that you’ll need to keep table and index bloat under control. Ensure autovaccum runs very frequently and use a current version of PostgreSQL that doesn’t need things like manually-managed free space map and has the latest, best-behaved autovacuum. I’d recommend 9.0 or above, preferably 9.1 or 9.2. Tune autovacuum to run aggressively.
When tuning and testing performance – test your queries with
EXPLAIN ANALYZE, don’t just guess.