I am having some trouble optimizing a query, and was hoping that someone here might be able to provide a few pointers.
I have two tables:
CREATE TABLE "blog_cached_posts" (
"id" int4 NOT NULL DEFAULT nextval('blog_cached_posts_id_seq'::regclass),
"title" varchar(255),
"content" text,
"content_encoded" text,
"published_at" timestamp(6) NULL,
"written_by" varchar(255),
"link" varchar(255),
"blog_id" int4,
"guid" varchar(255),
"created_at" timestamp(6) NULL,
"updated_at" timestamp(6) NULL,
"is_highlighted_post" bool DEFAULT false
)
With an index on blog_cached_posts.blog_id
CREATE TABLE "blogs" (
"id" int4 NOT NULL DEFAULT nextval('blogs_id_seq'::regclass),
"site_id" int4,
"image_id" int4,
"name" varchar(255),
"description" text,
"url" varchar(255),
"rss_feed_url" varchar(255),
"active" bool DEFAULT true,
"created_at" timestamp(6) NULL,
"updated_at" timestamp(6) NULL,
"date_highlighted" date,
"highlighted_category_feed_url" varchar(255),
"position" int4
)
With an index on blogs.site_id
This is the query:
SELECT "blog_cached_posts".*
FROM "blog_cached_posts"
join blogs on blogs.id = blog_cached_posts.blog_id
WHERE ((published_at IS NOT NULL) AND (blogs.site_id = 80))
ORDER BY published_at desc
LIMIT 5
Here is an EXPLAIN ANALYZE:
Limit (cost=9499.16..9499.17 rows=5 width=1853) (actual time=118.538..118.539 rows=5 loops=1)
-> Sort (cost=9499.16..9626.31 rows=50861 width=1853) (actual time=118.536..118.537 rows=5 loops=1)
Sort Key: blog_cached_posts.published_at
Sort Method: top-N heapsort Memory: 33kB
-> Hash Join (cost=16.25..8654.38 rows=50861 width=1853) (actual time=0.186..82.910 rows=48462 loops=1)
Hash Cond: (blog_cached_posts.blog_id = blogs.id)
-> Seq Scan on blog_cached_posts (cost=0.00..7930.94 rows=52954 width=1853) (actual time=0.042..56.635 rows=52950 loops=1)
Filter: (published_at IS NOT NULL)
-> Hash (cost=13.21..13.21 rows=243 width=4) (actual time=0.135..0.135 rows=243 loops=1)
-> Seq Scan on blogs (cost=0.00..13.21 rows=243 width=4) (actual time=0.007..0.089 rows=243 loops=1)
Filter: (site_id = 80)
Total runtime: 118.591 ms
Is there any way to optimize this beyond the ~120ms it is currently taking?
EDIT
Here is what I ended up doing. (After reading the comment by @ypercube)
I added an index to blog_cached_posts:
CREATE INDEX \"blog_cached_posts_published_at\" ON \"public\".\"blog_cached_posts\" USING btree(published_at DESC NULLS LAST);
COMMENT ON INDEX \"public\".\"blog_cached_posts_published_at\" IS NULL;
And I changed the select to the following:
SELECT "blog_cached_posts".*
FROM "blog_cached_posts"
join blogs on blogs.id = blog_cached_posts.blog_id
WHERE published_at is not null and blogs.site_id = 80
ORDER BY published_at desc nulls last
LIMIT 5
This brought the execution time down to ~3ms.
Here is the new execution plan:
Limit (cost=0.00..3.85 rows=5 width=1849) (actual time=0.027..0.047 rows=5 loops=1)
-> Nested Loop (cost=0.00..39190.01 rows=50872 width=1849) (actual time=0.026..0.046 rows=5 loops=1)
-> Index Scan using blog_cached_posts_published_at on blog_cached_posts (cost=0.00..24175.16 rows=52965 width=1849) (actual time=0.017..0.023 rows=5 loops=1)
Filter: (published_at IS NOT NULL)
-> Index Scan using blogs_pkey on blogs (cost=0.00..0.27 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=5)
Index Cond: (blogs.id = blog_cached_posts.blog_id)
Filter: (blogs.site_id = 80)
Total runtime: 0.086 ms
As I mentioned in a comment, I would first try adding a simple index on
published_at. It seems that if there wasn’t theORDER BYandLIMIT 5clauses, the query would be quite efficient and all other needed indexes existed.Therefore, adding an index on the field that is used for the final sorting is usually quite efficient.
As Dems explained in his answer: