MySQL has a nice statement: LOAD XML LOCAL INFILE
For example, if you have this table:
CREATE TABLE person (
person_id INT NOT NULL PRIMARY KEY,
fname VARCHAR(40) NULL,
lname VARCHAR(40) NULL
);
and the following XML file called person.xml:
<list>
<person>
<person_id>1</person_id>
<fname>Mikael</fname>
<lname>Ronström</lname>
</person>
<person>
<person_id>2</person_id>
<fname>Lars</fname>
<lname>Thalmann</lname>
</person>
</list>
You can do this:
LOAD XML LOCAL INFILE 'person.xml'
INTO TABLE person
ROWS IDENTIFIED BY '<person>';
My question is, what if the column names were different in the XML file than they are in the table? For example:
<list>
<person>
<PersonId>1</PersonId>
<FirstName>Mikael</FirstName>
<LastName>Ronström</LastName>
</person>
<person>
<PersonId>2</PersonId>
<FirstName>Lars</FirstName>
<LastName>Thalmann</LastName>
</person>
</list>
How can you accomplish the same thing with a MySQL statement without manipulating the XML file? I searched everywhere but couldn’t find an answer.
The following were the options available to me:
Option 1: Create a temporary table with different field names (as suggested by the other answers). This would have been a satisfactory approach. However, when I tried it, a new problem emerged: the LOAD XML statement does not, for some reason, accept minimized format empty elements (for example
<person />). So, the statement failed because the XML files I need to load occasionally have empty elements in that format.Option 2: Transform the XML file with XSLT before running the LOAD XML statement to change the element names and modify the empty element formats. This was not feasible because the XML files are very large and XSLT processing engines load the entire XML into memory before processing.
Option 3: Bypass the LOAD XML statement entirely and use a SAX parser to parse the XML file and insert the records directly into the database using JDBC and prepared statements. Even though raw JDBC and prepared statements are generally efficient, this proved to be too slow. MUCH slower than the LOAD XML statement.
Option 4: Use the LOAD DATA statement instead of the LOAD XML statement and play around with the optional clauses associated with that statement to fit my needs (e.g. lines separated by, etc.). This could have worked but would have been error prone and unstable.
Option 5: Parse the file with a fast forward-only parser and read/write XML elements simultaneously and generate a new XML file with the modified names in the desired format for the LOAD XML statement.
I ended up using option 5. I used the Java Streaming API for XML (StAX) for both reading the XML file and generating the modified XML file and then running the LOAD XML LOCAL INFILE through JDBC from inside the web application. It works perfectly and it is super fast.