I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB;
and
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
Each of the above queries takes about 2 minutes to run (Its a large table)
I wanted to combine the two queries in hopes to save time, so I tried:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.
Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?
UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?
With regards to your “extra work” question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It’s the same as using a distinct statement.
For this reason, especially combined with your except statements “union all” would likely be faster.
Read more here:
http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html