I want to fill a table with data that might be overlapping. I do this a few times in the code. so, the first time it is
INSERT INTO A (SELECT * FROM B)
and then the second time it is
INSERT INTO A
SELECT * FROM C
WHERE NOT EXISTS (SELECT * FROM A Where A.field = C.field)
the re-insertion is done a total of 3 times. I could change the code to just one insertion using a UNION on the 4 sub-tables, but it will require alot of re-design, so I wanted to know how much performance penalty I am receving right now
Thanks!!
A union is probably less costly. But as usual with databases, it depends.
Why?
Because what you are doing right now is this:
With a union you would be doing this:
I.e. your current queries scans table B, C, D and table A twice plus overhead for three separate queries. A union-query would scan table B, C, D, and sort the rows (to get unique values) and then insert them into table A. At a first glance it seems that union would be faster because you are doing two less scans and only one insert (and thus less locking).
What I mean by it depends is:
Indexes: Correctly indexed, seeks might be faster than sorting the data from B, C and D.
Amount of data: If you have little data compared to the hardware you are running the database on then this discussion might be pointless. I.e. there’s no point in optimizing parts that are not bottlenecks.