I have a table in my SQL Server where I “stage” my datawarehouse extract from our ERP system.
From this staging table (table name: DBO.DWUSD_LIVE) , I build my dimensions and load my fact data.
An example DIMENSION table is called “SHIPTO”, this dimensions has the following columns:
"shipto_id
"shipto"
"salpha"
"ssalpha"
"shipto address"
"shipto name"
"shipto city"
Right now I have an SSIS package that does a SELECT DISTINCT across the above columns to retrieve the “unique” data, then through the SSIS package I assign the “shipto_id” surrogate key to.
An example of my current TSQL Query is:
SELECT DISTINCT
"shipto", "salpha", "ssalpha", "shipto address", "shipto name", "shipto city"
FROM DBO.DWUSD_LIVE
This works great but is not “speedy”, some dimensions have 10 columns and doing a distinct select on those is not ideal.
In this dimension, my “Business Key” columns are “SHIPTO”, “SALPHA”, and “SSALPHA”.
So if I do:
SELECT DISTINCT
"shipto", "salpha", "ssalpha"
FROM DBO.DWUSD_LIVE
It yields the same results as:
SELECT DISTINCT
"shipto", "salpha", "ssalpha", "shipto address", "shipto name", "shipto city"
FROM DBO.DWUSD_LIVE
Is there a better way to do this TSQL QUERY? I need all the columns, but only DISTINCT on the business key columns.
Your help is appreciated.
Below is an image of how my project is setup in SSIS, the Dimensions is a SCD 1.

I would start by splitting this into two operations: generating the surrogate key and populating the dimension table. The first step will then be a
DISTINCTon only 3 columns, and the second step will become aJOIN. Indexing the columns used in both operations might then give you some improvement.You can combine the
DISTINCTwithNOT EXISTSto avoid processing rows that have already been mapped, something like this:Then you have the mapping, so you can do this:
You should also look at
MERGE, which is convenient if you’re using a Type 1 dimension and just want to update addresses or other attributes when they change (and it’s a useful command in general). But it’s only available from SQL Server 2008; you didn’t mention what version or edition of SQL Server you’re using.