Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8698953
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T01:50:15+00:00 2026-06-13T01:50:15+00:00

I have built a small inventory system using postgresql and psycopg2. Everything works great,

  • 0

I have built a small inventory system using postgresql and psycopg2. Everything works great, except, when I want to create aggregated summaries/reports of the content, I get really bad performance due to count()’ing and sorting.

The DB schema is as follows:

CREATE TABLE hosts
(
        id SERIAL PRIMARY KEY,
        name VARCHAR(255)
);
CREATE TABLE items
(
        id SERIAL PRIMARY KEY,
        description TEXT
);
CREATE TABLE host_item
(
        id SERIAL PRIMARY KEY,
        host INTEGER REFERENCES hosts(id) ON DELETE CASCADE ON UPDATE CASCADE,
        item INTEGER REFERENCES items(id) ON DELETE CASCADE ON UPDATE CASCADE
);

There are some other fields as well, but those are not relevant.

I want to extract 2 different reports:
– List of all hosts with the number of items per, ordered from highest
to lowest count
– List of all items with the number of hosts per, ordered from highest to lowest count

I have used 2 queries for the purpose:

Items with host count:

SELECT i.id, i.description, COUNT(hi.id) AS count
FROM items AS i
LEFT JOIN host_item AS hi
ON (i.id=hi.item)
GROUP BY i.id
ORDER BY count DESC
LIMIT 10;

Hosts with item count:

SELECT h.id, h.name, COUNT(hi.id) AS count
FROM hosts AS h
LEFT JOIN host_item AS hi
ON (h.id=hi.host)
GROUP BY h.id
ORDER BY count DESC
LIMIT 10;

Problem is: the queries runs for 5-6 seconds before returning any data. As this is a web based application, 6 seconds are just not acceptable. The database is heavily populated with approximately 50k hosts, 1000 items and 400 000 host/items relations, and will likely increase significantly when (or perhaps if) the application will be used.

After playing around, I found that by removing the “ORDER BY count DESC” part, both queries would execute instantly without any delay whatsoever (less than 20ms to finish the queries).

Is there any way I can optimize these queries so that I can get the result sorted without the delay? I was trying different indexes, but seeing as the count is computed it is possible to utilize an index for this. I have read that count()’ing in postgresql is slow, but its the sorting that are causing me problems…

My current workaround is to run the queries above as an hourly job, putting the result into a new table with an index on the count column for quick lookup.

I use Postgresql 9.2.

Update: Query plan as ordered 🙂

EXPLAIN ANALYZE
SELECT h.id, h.name, COUNT(hi.id) AS count
FROM hosts AS h
LEFT JOIN host_item AS hi
ON (h.id=hi.host)
GROUP BY h.id
ORDER BY count DESC
LIMIT 10;


 Limit  (cost=699028.97..699028.99 rows=10 width=21) (actual time=5427.422..5427.424 rows=10 loops=1)
   ->  Sort  (cost=699028.97..699166.44 rows=54990 width=21) (actual time=5427.415..5427.416 rows=10 loops=1)
         Sort Key: (count(hi.id))
         Sort Method: top-N heapsort  Memory: 25kB
         ->  GroupAggregate  (cost=613177.95..697840.66 rows=54990 width=21) (actual time=3317.320..5416.440 rows=54990 loops=1)
               ->  Merge Left Join  (cost=613177.95..679024.94 rows=3653163 width=21) (actual time=3317.267..5025.999 rows=3653163 loops=1)
                     Merge Cond: (h.id = hi.host)
                     ->  Index Scan using hosts_pkey on hosts h  (cost=0.00..1779.16 rows=54990 width=17) (actual time=0.012..15.693 rows=54990 loops=1)
                     ->  Materialize  (cost=613177.95..631443.77 rows=3653163 width=8) (actual time=3317.245..4370.865 rows=3653163 loops=1)
                           ->  Sort  (cost=613177.95..622310.86 rows=3653163 width=8) (actual time=3317.199..3975.417 rows=3653163 loops=1)
                                 Sort Key: hi.host
                                 Sort Method: external merge  Disk: 64288kB
                                 ->  Seq Scan on host_item hi  (cost=0.00..65124.63 rows=3653163 width=8) (actual time=0.006..643.257 rows=3653163 loops=1)
 Total runtime: 5438.248 ms





EXPLAIN ANALYZE
SELECT h.id, h.name, COUNT(hi.id) AS count
FROM hosts AS h
LEFT JOIN host_item AS hi
ON (h.id=hi.host)
GROUP BY h.id
LIMIT 10;


 Limit  (cost=0.00..417.03 rows=10 width=21) (actual time=0.136..0.849 rows=10 loops=1)
   ->  GroupAggregate  (cost=0.00..2293261.13 rows=54990 width=21) (actual time=0.134..0.845 rows=10 loops=1)
         ->  Merge Left Join  (cost=0.00..2274445.41 rows=3653163 width=21) (actual time=0.040..0.704 rows=581 loops=1)
               Merge Cond: (h.id = hi.host)
               ->  Index Scan using hosts_pkey on hosts h  (cost=0.00..1779.16 rows=54990 width=17) (actual time=0.015..0.021 rows=11 loops=1)
               ->  Index Scan Backward using idx_host_item_host on host_item hi  (cost=0.00..2226864.24 rows=3653163 width=8) (actual time=0.005..0.438 rows=581 loops=1)
 Total runtime: 1.143 ms

Update: All the answers to this question is really good for learning and understanding how Postgres works. There does not seem to be any definite solution to this problem, but I really appreciate all the excellent answers you have provided, and I will use those in my future work with Postgresql. Thanks alot guys!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T01:50:16+00:00Added an answer on June 13, 2026 at 1:50 am

    @Gordon and @willglynn have provided a lot of useful background as to why your query is slow.

    A workaround would be to add a counter to the tables items and hosts and triggers that keep them up to date – for a non-trivial cost to write operations.
    Or use materialized views like you do. I might opt for that.

    For that, you still need to execute these queries on a regular basis and they can be improved. Rewrite your first one to:

    SELECT id, i.description, hi.ct
    FROM   items i
    JOIN  (
        SELECT item AS id, count(*) AS ct
        FROM   host_item
        GROUP  BY item
        ORDER  BY ct DESC
        LIMIT  10
        ) hi USING (id);
    
    • If there is a row in table items for most rows in table host_item, it is faster to aggregate first and then JOIN. Contrary to what @willglynn speculates, this is not optimized automatically in Postgres 9.1.

    • count(*) is faster than count(col) on principal – and equivalent while col cannot be NULL. (A LEFT JOIN might introduce NULL values.)

    • Simplified LEFT JOIN to JOIN. It should be safe to assume that there are always at least ten distinct hosts. Doesn’t matter much for your original query, but it’s a requirement for this one.

    • Indexes on table host_item won’t help, and the PK on items covers the rest.

    Probably still not good enough for your case, but in my tests with Postgres 9.1 this form is more than twice as fast. Should translate to 9.2, but test with EXPLAIN ANALYZE to be sure.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have built an ERP for a small company using MS-Access (front end) and
I have built a small app using javascript. I am using javascript for form
I have built a small webform for registration (username, password,email...). I want to open
I am quite new to django, and I have built a small page using
I have a small site I built using the Play framework that I'm trying
I have to create a small asp.net-application. The last such web-application I have built
I have built a small web application in asp.net c# in VS 2010 using
I have a small blog app I have built using Django 1.4 and recently,
I have a small issue with sessions. I have built a small shopping cart-esque
I have a small RIA that I built as a learning/make-my-life-easier project that uses

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.