Well i have the following table(info from pgAdmin):
CREATE TABLE comments_lemms
(
comment_id integer,
freq integer,
lemm_id integer,
bm25 real
)
WITH (
OIDS=FALSE
);
ALTER TABLE comments_lemms OWNER TO postgres;
-- Index: comments_lemms_comment_id_idx
-- DROP INDEX comments_lemms_comment_id_idx;
CREATE INDEX comments_lemms_comment_id_idx
ON comments_lemms
USING btree
(comment_id);
-- Index: comments_lemms_lemm_id_idx
-- DROP INDEX comments_lemms_lemm_id_idx;
CREATE INDEX comments_lemms_lemm_id_idx
ON comments_lemms
USING btree
(lemm_id);
And one more table:
CREATE TABLE comments
(
id serial NOT NULL,
nid integer,
userid integer,
timest timestamp without time zone,
lemm_length integer,
CONSTRAINT comments_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE comments OWNER TO postgres;
-- Index: comments_id_idx
-- DROP INDEX comments_id_idx;
CREATE INDEX comments_id_idx
ON comments
USING btree
(id);
-- Index: comments_nid_idx
-- DROP INDEX comments_nid_idx;
CREATE INDEX comments_nid_idx
ON comments
USING btree
(nid);
in comments_lemms there are 8 million entries, in comments – 270 thousands.
Im performing the following sql query:
update comments_lemms set bm25=(select lemm_length from comments where id=comment_id limit 1)
And it takes more than 20 minutes of running and i stop it because pgAdmin looks like its about to crash.
Is there any way to modify this query or indexes or whatever in my database to speed up things a bit? I have to run some similar queries in future and it’s quite painful to wait more than 30 minutes for each one.
In other words, you’re making it go through 8M entries, and for each row you’re doing a nested loop with an index loopup. PG won’t rewrite/optimize it because of the
limit 1instruction.Try this instead:
It should do two seq scans and hash or merge join them together, then proceed with the update in one go.