I have a table that loads new data every day and another table that contains a history of changes to that table. What’s the best way to check if any of the data have changed since the last time data was loaded?
For example, I have table @a with some strategies for different countries and table @b tracks the changes made to table @a. I can use a checksum() to hash the fields that can change, and add them to the table if the existing hash is different from the new hash. However, MSDN doesn’t think this is a good idea since “collisions” can occur, e.g. two different values map to the same checksum.
MSDN link for checksum
http://msdn.microsoft.com/en-us/library/aa258245(v=SQL.80).aspx
Sample code:
declare @a table
(
ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into @a
select 1,'Long','USA'
insert into @a
select 2,'Short','CAN'
insert into @a
select 3,'Neutral','AUS'
declare @b table
(
Lastupdated datetime
,ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into @b
(
Lastupdated
,ownerid
,strategy
,country
)
select
getdate()
,a.ownerid
,a.strategy
,a.country
from @a a left join @b b
on a.ownerid=b.ownerid
where
b.ownerid is null
select * from @b
--get a different timestamp
waitfor delay '00:00:00.1'
--change source data
update @a
set strategy='Short'
where ownerid=1
--add newly changed data into
insert into @b
select
getdate()
,a.ownerid
,a.strategy
,a.country
from
(select *,checksum(strategy,country) as hashval from @a) a
left join
(select *,checksum(strategy,country) as hashval from @b) b
on a.ownerid=b.ownerid
where
a.hashval<>b.hashval
select * from @b
How about writing a query using
EXCEPT? Just write queries for both tables and then addEXCEPTbetween them:The result will be the entries in
table_newthat aren’t intable_old(i.e. that have been updated or inserted).Note: To get rows recently deleted from
table_old, you can reverse the order of the queries.