I am trying to create a new table which is the aggregate-sum of 6 other tables with matching primary keys. This keeps stalling if I use more than 3 input tables:
CREATE TABLE table_name AS SELECT table1.timestamp, table1.value + table2.value + table3.value + table4.value AS value FROM table1, table2, table3, table4 WHERE table1.timestamp=table2.timestamp AND table2.timestamp=table3.timestamp AND table3.timestamp=table4.timestamp;
Problem: The script works fairly fast (<5 seconds) when running for 2-3 tables but stalls otherwise. I have not tried running it longer than 5 minutes but this would be too slow for my purposes anyway.
Description of tables: Each table has an identical format of 6 columns (2 of which are relevant). The primary key is an integer “timestamp” and the “value” is a real number. Table sizes vary, but hover around 100k rows/entries for each table. The tables mostly have the same primary keys but some data points are missing in each table so it is crucial that those data points be omitted from the new table.
Is there something I am doing wrong and what should I do to make this run fast?
EDIT:
Ps: here is the actual output of a complete “EXPLAIN ANALYZE” query:
eldb=# EXPLAIN ANALYZE CREATE TABLE test_table AS SELECT count1.timestamp, count
1.year, count1.month, count1.day, count1.period, count1.the_value + count2.the_value + count
3.the_value + count4.the_value + count5.the_value + count6.the_value AS the_value FROM "table_name-1" AS count
1, "table_name-2" AS count2, "table_name-3" AS count3, "table_name-4" AS count4,
"table_name-5" AS count5, "table_name-6" AS count6 WHERE count1.timestamp=count
2.timestamp AND count2.timestamp=count3.timestamp AND count3.timestamp=count4.ti
mestamp AND count4.timestamp=count5.timestamp AND count5.timestamp=count6.timest
amp AND count1.timestamp>2012020000 AND count2.timestamp>2012020000 AND count3.t
imestamp>2012020000 AND count4.timestamp>2012020000 and count5.timestamp>2012020
000 AND count6.timestamp>2012020000;
QUERY
PLAN
--------------------------------------------------------------------------------
------------------------------------------------------------------------------
Merge Join (cost=20323.61..153806457715456.50 rows=5592655588099248 width=44)
(actual time=84.524..3310.692 rows=3410 loops=1)
Merge Cond: (count1."timestamp" = count4."timestamp")
-> Nested Loop (cost=10161.80..4417379579.26 rows=1057606343 width=40) (act
ual time=44.597..1616.585 rows=3410 loops=1)
Join Filter: (count2."timestamp" = count1."timestamp")
-> Merge Join (cost=10161.80..101480.96 rows=6070522 width=16) (actua
l time=43.648..48.950 rows=3410 loops=1)
Merge Cond: (count2."timestamp" = count3."timestamp")
-> Sort (cost=5080.90..5168.01 rows=34844 width=8) (actual time
=25.608..25.804 rows=3410 loops=1)
Sort Key: count2."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-2" count2 (cost=0.00..1972.66
rows=34844 width=8) (actual time=0.064..23.297 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=5080.90..5255.12 rows=34844 width=8) (actu
al time=18.030..19.847 rows=3410 loops=1)
-> Sort (cost=5080.90..5168.01 rows=34844 width=8) (actua
l time=18.023..18.416 rows=3410 loops=1)
Sort Key: count3."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-3" count3 (cost=0.00..19
72.66 rows=34844 width=8) (actual time=0.023..16.294 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=0.00..2351.88 rows=34844 width=24) (actual time=
0.000..0.147 rows=3410 loops=3410)
-> Seq Scan on "table_name-1" count1 (cost=0.00..1972.66 rows=3
4844 width=24) (actual time=0.020..16.853 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=10161.80..4007228099.11 rows=1057606343 width=24) (act
ual time=39.917..1687.402 rows=3410 loops=1)
-> Nested Loop (cost=10161.80..4004584083.26 rows=1057606343 width=24
) (actual time=39.915..1685.956 rows=3410 loops=1)
Join Filter: (count4."timestamp" = count6."timestamp")
-> Merge Join (cost=10161.80..101480.96 rows=6070522 width=16)
(actual time=38.689..44.309 rows=3410 loops=1)
Merge Cond: (count4."timestamp" = count5."timestamp")
-> Sort (cost=5080.90..5168.01 rows=34844 width=8) (actua
l time=18.960..19.156 rows=3410 loops=1)
Sort Key: count4."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-4" count4 (cost=0.00..19
72.66 rows=34844 width=8) (actual time=0.059..17.271 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=5080.90..5255.12 rows=34844 width=8)
(actual time=19.717..21.826 rows=3410 loops=1)
-> Sort (cost=5080.90..5168.01 rows=34844 width=8)
(actual time=19.708..20.266 rows=3410 loops=1)
Sort Key: count5."timestamp"
Sort Method: quicksort Memory: 256kB
-> Seq Scan on "table_name-5" count5 (cost=0.
00..1972.66 rows=34844 width=8) (actual time=0.034..18.001 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
-> Materialize (cost=0.00..2283.88 rows=34844 width=8) (actual
time=0.000..0.148 rows=3410 loops=3410)
-> Seq Scan on "table_name-6" count6 (cost=0.00..1972.66
rows=34844 width=8) (actual time=0.036..17.785 rows=3410 loops=1)
Filter: ("timestamp" > 2012020000)
Total runtime: 3330.933 ms
(40 rows)
And here is the table structure (same for all tables):
CREATE TABLE "table_name-6"
(
"timestamp" integer NOT NULL,
year integer NOT NULL,
month integer NOT NULL,
day integer NOT NULL,
period integer NOT NULL,
the_value real,
CONSTRAINT "table_name-6_pkey" PRIMARY KEY ("timestamp" )
)
Note: the actual table names and values were renamed. Also, this output was for a small fraction of the actual table size.
Result && plan:INSERT 0 2000
UPDATE: Most people prefer the JOIN-syntax to the where … syntax: