I’m writing tsql for SQL Server 2008. I’ve got two tables with roughly 2 million rows each. The Source table gets updated daily and changes are pushed to the Destination table based on a last_edit date. If this date is newer in source than destination then update the destination row. If a new row exists in source compared to destination insert it into destination. This is really only a one way process that I’m concerned with, from source to destination. The source and destination table use a unique identifier across 4 columns, serialid, itemid, systemcode, and role.
My table are modeled similar to the script below. There are many data columns but I’ve limited it to 3 in this example. I’m looking for 2 outputs. 1 set of data with rows to update and 1 set of data with rows to add.
CREATE TABLE [dbo].[TABLE_DEST](
[SERIALID] [nvarchar](20) NOT NULL,
[ITEMID] [nvarchar](20) NOT NULL,
[SYSTEMCODE] [nvarchar](20) NOT NULL,
[ROLE] [nvarchar](10) NOT NULL,
[LAST_EDIT] [datetime] NOT NULL],
[DATA_COLUMN_1] [nvarchar](10) NOT NULL,
[DATA_COLUMN_2] [nvarchar](10) NOT NULL,
[DATA_COLUMN_3] [nvarchar](10) NOT NULL
)
CREATE TABLE [dbo].[TABLE_SOURCE](
[SERIALID] [nvarchar](20) NOT NULL,
[ITEMID] [nvarchar](20) NOT NULL,
[SYSTEMCODE] [nvarchar](20) NOT NULL,
[ROLE] [nvarchar](10) NOT NULL,
[LAST_EDIT] [datetime] NOT NULL],
[DATA_COLUMN_1] [nvarchar](10) NOT NULL,
[DATA_COLUMN_2] [nvarchar](10) NOT NULL,
[DATA_COLUMN_3] [nvarchar](10) NOT NULL
)
Here’s what I’ve got for the update dataset.
select s.*
from table_dest (nolock) inner join table_source s (nolock)
on s.SYSTEMCODE = fd.SYSTEMCODE1Y
and s.ROLE = d.ROLE
and s.SERIALID = d.SERIALID
and s.ITEMID = d.ITEMID
and s.LAST_EDIT > d.LAST_EDIT
I don’t know how best to accomplish finding the rows to add. But the solution has to be pretty efficient for the database.
Unmatched rows can be found with left/right join and checking target table keys for null:
If you need these rows just to perform respective operations then there is special feature for you: