I have a table containing several million rows. I have an expression index on this table (I created both directions to see if it had an effect.
CREATE INDEX ON statuses (date_trunc('hour', created_at) ASC)
CREATE INDEX ON statuses (date_trunc('hour', created_at) DESC)
I’m trying to make a query that collects the count of the statuses for each hour using a group by but only for statuses that were created today (or in the last 7 days for example). However attempting to remove all entries before a certain date doesn’t use the index and instead filters all rows. However, if I remove the greater than and use an equals the index is used. I’ve put the output of EXPLAIN below. Hopefully someone can help me make this query use the index or at least improve the performance so that it’s in the order of milliseconds not seconds.
Using equals the index is used correctly:
=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) = '2013-02-06 00:00:00';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=132.48..29443.34 rows=1653 width=8) (actual time=4.362..4.363 rows=1 loops=1)
-> Bitmap Heap Scan on statuses (cost=132.48..29419.22 rows=18337 width=8) (actual time=0.209..2.159 rows=1319 loops=1)
Recheck Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
-> Bitmap Index Scan on statuses_date_trunc_idx1 (cost=0.00..131.57 rows=18337 width=0) (actual time=0.178..0.178 rows=1319 loops=1)
Index Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
Total runtime: 4.416 ms
(6 rows)
However, as soon as I use greater than (or less than) this results in the query doing a filter of the table without the index.
=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) > '2013-02-06 00:00:00';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=185386.54..185772.10 rows=110160 width=8) (actual time=2915.495..2915.774 rows=21 loops=1)
-> Seq Scan on statuses (cost=0.00..184164.06 rows=1222485 width=8) (actual time=1676.827..2869.748 rows=47070 loops=1)
Filter: (date_trunc('hour'::text, created_at) > '2013-02-06 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 3620426
Total runtime: 2916.049 ms
(5 rows)
I can get around this by using IN and listing every hour within the region I want to select in this circumstance but I’d really like to figure out why the index isn’t being used for the greater than query?
=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) IN ('2013-02-06 00:00:00', '2013-02-06 01:00:00');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=51988.38..51999.94 rows=3305 width=8) (actual time=7.218..7.223 rows=2 loops=1)
-> Bitmap Heap Scan on statuses (cost=262.96..51951.70 rows=36675 width=8) (actual time=0.376..4.576 rows=2507 loops=1)
Recheck Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
-> Bitmap Index Scan on statuses_date_trunc_idx1 (cost=0.00..261.13 rows=36675 width=0) (actual time=0.341..0.341 rows=2507 loops=1)
Index Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
Total runtime: 7.305 ms
(6 rows)
Estimate for the
statusestable is 26 times more then actual number of rows returned for the “bad” query.VACUUM ANALYZE statuses;statuses.created_atcolumnALTER TABLE statuses ALTER created_at SET STATISTICS 500;and analyze again.This should help.
EDIT: You need to check your
autovacuumsettings.Read this part of manual and check your config like this:
If your table is too big, you might adjust
autovacuum_analyze_thresholdand/orautovacuum_analyze_scale_factorusingALTER TABLE tab SET (storage_parameter = ...)syntax.