I am having trouble with a slow transaction in Postgres trying to retrieve the

Question

0

Asked: May 26, 20262026-05-26T01:36:45+00:00 2026-05-26T01:36:45+00:00

I am having trouble with a slow transaction in Postgres trying to retrieve the

0

I am having trouble with a slow transaction in Postgres trying to retrieve the latest prices of the catalogproducts that have a buy that is greater than their sell. It is a rather large table at this point, over 2 million rows. I have this for historical purposes. What I am currently using is:

select * from ta_price a
  join (
   select catalogproduct_id, max(timestamp) ts
     from ta_price
    group by catalogproduct_id
       ) b on a.catalogproduct_id = b.catalogproduct_id
          and a.timestamp = b.ts
          AND buy > sell;

catalogproduct_id is a Foreign Key to catalogproduct table.

Out of 2201760 total rows, it selects 2296 rows. The total runtime is 181,792.705 ms.

Any insight on how to improve this?

Edit:

I am blown away by all the answers! I want to also qualify this question more under the realm of the Django ORM. I am struggling to incorporate a composite key (or the like) on this table (using catalogproduct_id and timestamp). I have a primary key that is an autoincrementing index, which I guess is as good as not having none at all.

Edit 2:
After adding a partial index that @Erwin suggested,
CREATE INDEX my_partial_idx ON ta_price (catalogproduct_id, timestamp) WHERE buy > sell;, I am using the query from @wildplasser for around 10-12 second query time. For further clarification, my table is snapshots of prices (buy and sell) of products over time. At any given time, I want to know what products currently (as of their latest snapshot time) have a buy > sell.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T01:36:46+00:00

Revised answer after some consideration

SELECT *
  FROM ta_price a
  JOIN (
   SELECT catalogproduct_id, max(timestamp) ts
     FROM ta_price
    GROUP BY catalogproduct_id
        ) b ON a.catalogproduct_id = b.catalogproduct_id
           AND a.timestamp = b.ts
           AND a.buy > a.sell;

buyand sell are not qualified in your question. Depending on the selectivity of buy > sell you can speed up the query by adding the same WHERE-clause to the subselect.
However, this yields different results. I add it on the off chance, that you might have overlooked it:

SELECT *
  FROM ta_price a
  JOIN (
   SELECT catalogproduct_id, max(timestamp) ts
     FROM ta_price
    WHERE buy > sell
    GROUP BY catalogproduct_id
        ) b ON a.catalogproduct_id = b.catalogproduct_id
           AND a.timestamp = b.ts
 WHERE a.buy > a.sell;

Either way, a simple index like @Will implies will help:
~~CREATE INDEX my_idx ON ta_price (catalogproduct_id, timestamp);~~

There is a superior approach, though.
An unconditional max() in the subselect will result in a sequential table scan regardless of indexes. Such an operation will never be fast with 2.2m rows.
The JOIN condition, combined with the WHERE clause of the outer SELECT, will profit from an index like the one above. Depending on the selectivity of buy > sell a partial index will be a little or substantially faster and, correspondingly, smaller on disc and in RAM:

CREATE INDEX my_partial_idx ON ta_price (catalogproduct_id, timestamp)
 WHERE buy > sell;

The order of the columns in the index does not matter in this case. It will also speed ab my second variant of the query.

You mentioned the table was for “historic” purposes? If that means no new data, you could speed things up greatly with a materialized view.

On a side note: I would not use timestamp as a column name. It is allowed in PostgreSQL, but it’s a reserved word in all SQL standards.

OK, first things last: for a table of 2.2m rows you need way more resources than postgres has out of the box.

Look at your postgresql.conf file and inspect the settings for shared_buffers and work_mem for a start.
Consult the postgres wiki for performance tuning
Consult the fine manual on ressource consumption
Consult the fine manual on planner costs
Increase these statistics setting:
ALTER TABLE tmp.ta_price ALTER COLUMN buy SET STATISTICS 1000; ALTER TABLE tmp.ta_price ALTER COLUMN sell SET STATISTICS 1000; ALTER TABLE tmp.ta_price ALTER COLUMN ts SET STATISTICS 1000;

Then run ANALYZE tmp.ta_price;
Be sure that autovacuum is running. If in doubt, run VACUUM ANALYZE ta_price and see if it had an effect.

I have played with the test setup of wildplasser (which was very helpful!) on a pg 8.4 installation with limited ressources.
Here are the total runtimes fom EXPLAIN ANYLYZE

Erwin 1)        901.487 ms  
wildplasser 1) 1148.045 ms  
A.H.           2922.113 ms

Variant 2 with the additional (buy > sell) clause:

Erwin 2)        536.678 ms  
wildplasser 2)  809.215 ms

With partial index:

Erwin 1)       1166.793 ms  -- slower (!), than unexpected

_{Probably planner costs are off, this test db cluster is optimized for main db which

has way more resources.}

wildplasser 1) 1122.609 ms -- rest is faster as expected  

Erwin 2)        481.487 ms  
wildplasser 2)  769.887 ms

Resumé

A.H.’s version takes much longer (same result as you reported). Window functions tend to be slow, especially on older versions of postgres. My alternative query is twice as fast, as expected. Question is, if the different results are desired – maybe not.

Anyway, that were 300k rows. Query takes 0.5 – 1s on version 8.4 with limited resources (but proper settings, mostly) on a 5 year old server. With a decent machine and decent settings (enough RAM!) you should bring it down to under 10s at least.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am having trouble with a slow transaction in Postgres trying to retrieve the

Leave an answerCancel reply

1 Answer

Revised answer after some consideration

With partial index:

Resumé

Leave an answer
Cancel reply