I am integrating between 4 data sources: InternalDeviceRepository ExternalDeviceRepository NightlyDeviceDeltas MidDayDeviceDeltas Changes flow into

Question

0

Asked: May 29, 20262026-05-29T07:54:29+00:00 2026-05-29T07:54:29+00:00

I am integrating between 4 data sources: InternalDeviceRepository ExternalDeviceRepository NightlyDeviceDeltas MidDayDeviceDeltas Changes flow into

0

I am integrating between 4 data sources:

InternalDeviceRepository
ExternalDeviceRepository
NightlyDeviceDeltas
MidDayDeviceDeltas

Changes flow into the InternalDeviceRepository from the other three sources.
All sources eventually are transformed to have the definition of

FIELDS
=============
IdentityField
Contract
ContractLevel
StartDate
EndDate
ContractStatus
Location

IdentityField is the PrimaryKey, Contract Key is a secondary Key only if a match exists, otherwise a new record needs to be created.

Currently I compare all the fields in a WHERE clause in SQL Statements and also in a number of places in SSIS packages. This creates some unclean looking SQL and SSIS packages.

I’ve been mulling computing a hash of ContractLevel, StartDate, EndDate, ContractStatus, and Location and adding that to each of the input tables. This would allow me to use a single value for comparison, instead of 5 separate ones each time.

I’ve never done this before, nor have I seen it done. Is there a reason that it should be used, or is that a cleaner way to do it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T07:54:30+00:00

It is a valid approach. Consider to introduce a calculated field with the hash and index on it.

You may use either CHECKSUM function or write your own hash function like this:

CREATE FUNCTION dbo.GetMyLongHash(@data VARBINARY(MAX))
RETURNS VARBINARY(MAX)
WITH RETURNS NULL ON NULL INPUT
AS
BEGIN
    DECLARE @res VARBINARY(MAX) = 0x
    DECLARE @position INT = 1, @len INT = DATALENGTH(@data)

    WHILE 1 = 1
    BEGIN
        SET @res = @res + HASHBYTES('MD5', SUBSTRING(@data, @position, 8000))
        SET @position = @position+8000
        IF @Position > @len 
          BREAK
    END
    WHILE DATALENGTH(@res) > 16 SET @res= dbo.GetMyLongHash(@res)
    RETURN @res
END

which will give you 16-byte value – you may take all the 16 bytes as Guid, or only first 8-bytes as bigint and compare it.

Adapt the function in your way – to accept string as parameter or even all the your fields instead of varbinary

BUT

be careful with strings casing, datetime formats
if using CHECKSUM – check also other fields, checksum produces dublicates
avoid using 4-byte hash result on relaively big table

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am integrating between 4 data sources: InternalDeviceRepository ExternalDeviceRepository NightlyDeviceDeltas MidDayDeviceDeltas Changes flow into

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply