Sorry if this is the wrong place, but I’m not sure what Stack Exchange site is suitable for these types of questions.
I have a task to build a pool of adresses for direct marketing. I have to:
- Merge different adress lists (all of which already in Database)
- Correct names based on some rules (e.g if name is like “John & Mary Doe”, I need to split the name as “John Doe” and “Mary Doe”)
- Deduplicate records
- Find customer numbers by joining with our customer database on names and adresses
- Join with several tables that supply customer information (age, gender, customer value, what products has he/she already bought from us, when was he/she contacted recently)
- Join with several tables that supply regional information (how many products have we sold in that region, what is the sociodemographical structure, what is the average income of the region)
- Construct a flat table with all those information and make it available in the same database (no input/output from/to other systems or flat files)
- Schedule it to run every night
So now I am considering which is the best way of achieving this. I can think of following alternatives.
-
Just write a big SQL batch with UNIONs, JOINS and SELECT INTOs
+ straightforward
+ flexible
– hard to read
– hard to maintain
– slow when I always create intermediate tables between steps -
SQL Server Integration Services
+ easy to read and maintain
+ good error handling
+ self-documentation
+ visual
– I need to learn it
– not sure if it’s flexible enough when it comes to window functions (ranking, row number, etc.)
– not sure if it would be faster or slower than SQL
– AFAIK no way to create SQL from data flow (just in case) -
SQL Server Analysis Services
– I don’t even know how it differs from SSIS -
Some other way??
Thanks
SSIS as name says is useful for data integration/transformation from many different sources. If you are doing all stuff on SQL Server – use stored procedures. And that should be enough for your needs. Temporary tables will be very useful here.
On the other hand…
SSIS is very powerful tool because all data flow work is done in-memory, so it is quick but you need to worry about your memory consumption. It depends of amount of data but you also need to avoid here any asynchronous operations(grouping, sorting), because they can easily fill your memory. If there will be a lot of transrofmations, updates, calculations etc in data flow, SSIS can be even faster.
But…
You have third option here. Combine this two using “Data Flow” and “Execute SQL Task” components.