I need to process XML documents of varying formats into records in a MySQL

Question

0

Asked: May 11, 20262026-05-11T12:53:32+00:00 2026-05-11T12:53:32+00:00

I need to process XML documents of varying formats into records in a MySQL

0

I need to process XML documents of varying formats into records in a MySQL database on a daily basis. The data I need from each XML document is interspersed with a good deal of data I don’t need, and each document’s node names are different. For example:

source #1:

<object id='1'>     <title>URL 1</title>     <url>http://www.one.com</url>     <frequency interval='60' />     <uselessdata>blah</uselessdata> </object> <object id='2'>     <title>URL 2</title>     <url>http://www.two.com</url>     <frequency interval='60' />     <uselessdata>blah</uselessdata> </object>

source #2:

<object'>     <objectid>1</objectid>     <thetitle>URL 1</thetitle>     <link>http://www.one.com</link>     <frequency interval='60' />    <moreuselessdata>blah</moreuselessdata> </object> <object'>     <objectid>2</objectid>     <thetitle>URL 2</thetitle>     <link>http://www.two.com</link>     <frequency interval='60' />     <moreuselessdata>blah</moreuselessdata> </object>

…where I need the object’s ID, interval, and URL.

My ideas for approaches are:

1.) Having a separate function to parse each XML document and iteratively create the SQL query from within that function

2.) Having a separate function parse each document and iteratively add each object to my own object class, and have the SQL work done by a class method

3.) Using XSLT to convert all the documents into a common XML format and then writing a parser for that document.

The XML documents themselves aren’t all that large, as most will be under 1MB. I don’t anticipate their structure changing often (if ever), but there is a strong possibility I will need to add and remove further sources as time goes on. I’m open to all ideas.

Also, sorry if the XML samples above are mangled… they’re not terribly important, just a rough idea to show that the node names in each document are different.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T12:53:32+00:00

Using XSLT is an overkill. I like approach (2), it makes a lot of sense.

Using Python I’d try to make a class for every document type. The class would inherit from dict and on its __init__ parse the given document and populate itself with the ‘id’, ‘interval’ and ‘url’.

Then the code in main would be really trivial, just instantiate instances of those classes (which are also dicts) with the appropriate documents and then pass them off as normal dicts.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to process XML documents of varying formats into records in a MySQL

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply