I have a table containing several million rows. I have an expression index on

Question

0

Asked: June 18, 20262026-06-18T11:55:46+00:00 2026-06-18T11:55:46+00:00

I have a table containing several million rows. I have an expression index on

0

I have a table containing several million rows. I have an expression index on this table (I created both directions to see if it had an effect.

CREATE INDEX ON statuses (date_trunc('hour', created_at) ASC)
CREATE INDEX ON statuses (date_trunc('hour', created_at) DESC)

I’m trying to make a query that collects the count of the statuses for each hour using a group by but only for statuses that were created today (or in the last 7 days for example). However attempting to remove all entries before a certain date doesn’t use the index and instead filters all rows. However, if I remove the greater than and use an equals the index is used. I’ve put the output of EXPLAIN below. Hopefully someone can help me make this query use the index or at least improve the performance so that it’s in the order of milliseconds not seconds.

Using equals the index is used correctly:

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) = '2013-02-06 00:00:00';
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=132.48..29443.34 rows=1653 width=8) (actual time=4.362..4.363 rows=1 loops=1)
   ->  Bitmap Heap Scan on statuses  (cost=132.48..29419.22 rows=18337 width=8) (actual time=0.209..2.159 rows=1319 loops=1)
         Recheck Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
         ->  Bitmap Index Scan on statuses_date_trunc_idx1  (cost=0.00..131.57 rows=18337 width=0) (actual time=0.178..0.178 rows=1319 loops=1)
               Index Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone)
 Total runtime: 4.416 ms
(6 rows)

However, as soon as I use greater than (or less than) this results in the query doing a filter of the table without the index.

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) > '2013-02-06 00:00:00';
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=185386.54..185772.10 rows=110160 width=8) (actual time=2915.495..2915.774 rows=21 loops=1)
   ->  Seq Scan on statuses  (cost=0.00..184164.06 rows=1222485 width=8) (actual time=1676.827..2869.748 rows=47070 loops=1)
         Filter: (date_trunc('hour'::text, created_at) > '2013-02-06 00:00:00'::timestamp without time zone)
         Rows Removed by Filter: 3620426
 Total runtime: 2916.049 ms
(5 rows)

I can get around this by using IN and listing every hour within the region I want to select in this circumstance but I’d really like to figure out why the index isn’t being used for the greater than query?

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) IN ('2013-02-06 00:00:00', '2013-02-06 01:00:00');
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=51988.38..51999.94 rows=3305 width=8) (actual time=7.218..7.223 rows=2 loops=1)
   ->  Bitmap Heap Scan on statuses  (cost=262.96..51951.70 rows=36675 width=8) (actual time=0.376..4.576 rows=2507 loops=1)
         Recheck Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
         ->  Bitmap Index Scan on statuses_date_trunc_idx1  (cost=0.00..261.13 rows=36675 width=0) (actual time=0.341..0.341 rows=2507 loops=1)
               Index Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[]))
 Total runtime: 7.305 ms
(6 rows)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T11:55:48+00:00

Estimate for the statuses table is 26 times more then actual number of rows returned for the “bad” query.

Try running VACUUM ANALYZE statuses;
If no luck, increase statistics target for the statuses.created_at column ALTER TABLE statuses ALTER created_at SET STATISTICS 500; and analyze again.

This should help.

EDIT: You need to check your autovacuum settings.

Read this part of manual and check your config like this:

SELECT name,setting,source FROM pg_settings WHERE name ~ 'autovacuum';

If your table is too big, you might adjust autovacuum_analyze_threshold and/or autovacuum_analyze_scale_factor using ALTER TABLE tab SET (storage_parameter = ...) syntax.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a table containing several million rows. I have an expression index on

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply