I have a XML file that I need to populate multiple SQL tables, and I was wondering what the best way to do that is. I was thinking dataset, or xslt but I honestly not sure. Here is my generated XML (part of it)
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
- <!-- Created: 8/3/2010 12:09:15 PM
-->
- <Trip>
- <TripDetails>
<DepartureDate />
<ReturnDate />
<TripTypeA>3</TripTypeA>
<TripTypeB>1</TripTypeB>
<PurposeOfTrip>vacation</PurposeOfTrip>
<Region>5</Region>
- <Countries>
<Country>105</Country>
<Country>135</Country>
</Countries>
- <Cities>
<City>Cancun</City>
<City>Tokyo</City>
<City>Mayo</City>
</Cities>
<OverallRating>4</OverallRating>
<Suppliers>53</Suppliers>
<SuppliersComments>Good flight</SuppliersComments>
- <Transport>
<TransportType>1</TransportType>
<TransportType>3</TransportType>
</Transport>
<TransportComment>Transportation was fast</TransportComment>
I have a couple different tables I need populating.(keeping it short for example)
TripDetails (TripID, TripTypeA, TripTypeB, SupplierID, overallRating)
TripCountries (TripCountryID, TripID, CountryCode)
I have a bunch more tables(cities, transport) but if I can figure out how to update TripDetails(the main table) and TripCountries (which is a table that brings together TripDetails, and Countries) I think I will be good, thanks!
Assuming you’re using SQL Server, you should parse the XML into
DataTables and use theSqlBulkCopyobject to shoot them into the database super-fast. There are lots of resources to help you learn about SqlBulkCopy. Here’s a recent discussion from another StackOverflow question to get you started: Sql Server 2008 Tuning with large transactions (700k+ rows/transaction)If the XML file is really large, you should be careful what sort of parser you use. XDocument and XmlDocument load the whole thing into memory. If the files are small enough, say under 10MB, you should be fine using those parsers.
EDIT:
Here’s a quick mock-up of how you could get the XML into DataTables. It’s in VB since VB makes XML a tad easier.
BTW – when doing this kind of ETL, it’s best to pump your data into staging tables first rather than directly into your production tables. That way, you can validate data types and ensure referential integrity and handle key management and get everything perfectly situated without locking up or polluting your production tables.