I have a fairly easy SQL task at hand and I’d like validation (or guidance) for the solution I came up with. Thank you for helping! (this is my first post)
Here’s the problem I’m facing (simplified):
I’m importing user information from a flat file into a staging table (using SSIS). Each user will have 2 or 3 records. Each line will contain important data. The end result needs to be 1 record per customer that contains info from all 3.
Here’s an example of the data:
PK | Name | UniqueCustID | Info1 | Info2 | Info3 |
----------------------
1 | John Doe | 12345 | Opt1 | NULL | NULL
2 | John Doe | 12345 | NULL | Opt2 | NULL
3 | John Doe | 12345 | NULL | NULL | Opt3
The final result needs be be something like this:
PK | Name | UniqueCustID | Info1 | Info2 | Info3 |
----------------------
1 | John Doe | 12345 | Opt1 | Opt2 | Opt3
I’m trying to keep this as simple as possible. I want to handle this with a basic Execute SQL task in SSIS (or a couple). What are people’s jump-to reactions? Can I provide any additional information? Thank you again.
UPDATE – To show the two step process I am trying:
1) This should make all of the individual customer sets match:
WITH CustInfoTMP (UniqueCustID,Info1,Info2,Info3)
AS
(
SELECT UniqueCustID,MAX(Info1),MAX(Info2),MAX(Info3)
FROM CustStaging
GROUP BY UniqueCustID
)
UPDATE CustStaging
SET
CustStaging.Info1 = CustInfoTMP.Info1
CustStaging.Info2 = CustInfoTMP.Info2
CustStaging.Info3 = CustInfoTMP.Info3
FROM CustStaging
INNER JOIN CustStagingTMP ON CustStaging.UniqueCustID = CustStagingTMP.UniqueCustID
2) I then using this to delete duplicate records:
DELETE
FROM CustStaging
WHERE UniqueCustID NOT IN
(
SELECT MAX(PK)
FROM CustStaging
GROUP BY UniqueCustID
)
I hope everyone is following this. I really appreciate the feedback.
How about this?