I have ~2 million rows or so of data, each row with an artificial PK, and two Id fields (so: PK, ID1, ID2). I have a unique constraint (and index) on ID1+ID2.
I get two sorts of updates, both with a distinct ID1 per update.
- 100-1000 rows of all-new data (ID1 is new)
- 100-1000 rows of largely, but not necessarily completely overlapping data (ID1 already exists, maybe new ID1+ID2 pairs)
What’s the most efficient way to maintain this ‘set’? Here are the options as I see them:
- Delete all the rows with ID1, insert all the new rows (yikes)
- Query all the existing rows from the set of new data ID1+ID2, only insert the new rows
- Insert all the new rows, ignore inserts that trigger unique constraint violations
Any thoughts?
Not all of your listed solutions are functionally equivalent, so without more knowledge about what you want or need to accomplish, it’s hard to say which is most appropriate.
I’d suggest [2] based on the available info.