So I just started diggin SSIS today, so don’t hate too much if there is something obvious I am missing.
So I have an XML file (from a third party)
<root>
<foo>
<fooId>12345</fooId>
<name>FOO</name>
<bars>
<bar>BAR 1</bar>
<bar>BAR 2</bar>
[...]
</bars>
</foo>
[...]
</root>
and corresponding tables in my DB:
Foo with fields (FooID, Name)
Bar with fields (BarID (identity PK), FooID, Name)
So basically Bar is like a set of attributes for Foo.
So I add an XML source that points to that file and it produces 3 different datasets (foo, bars, bar). Problem is that bar set contains bar‘s value + some autogenerated ID, which is not very useful. The only way I see from here to get a bar set with bar value and fooId is by sorting and merging-joining those sets, which seems rather odd and probably gonna brutally murder performance (we talking hundreds of K’s of foo here).
Question is: how to do this properly?
I wouldn’t worry about optimising performance yet. Just add another SSIS step to transform the datasets.
When you have the whole thing working review performance. SSIS transformations are easier to maintain than XSLT. Hundreds of K’s of foo shouldn’t be an issue, depending on how often you run the module. I haven’t used SSIS for ETLfor a while, so I’m not quite up yo speed on that, but I am using XSLT, and an extra SSIS step is easier to maintain if you keep it simple.
Just my opinion.