I am using Python to convert CSV files to XML format. The CSV files have a varying amount of rows ranging anywhere from 2 (including headers) to infinity. (realistically 10-15 but unless there’s some major performance issue, I’d like to cover my bases) In order to convert the files I have the following code:
for row in csvData:
if rowNum == 0:
xmlData.write(' <'+csvFile[:-4]+'-1>' + "\n")
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
if rowNum == 1:
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-1>' + "\n" + ' <' +csvFile[:-4]+'-2>' + "\n")
if rowNum == 2:
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-2>' + "\n")
if rowNum == 3:
for i in range(len(tags)):
xmlData.write('<'+csvFile[:-4]+'-3>' + "\n" + ' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </'+csvFile[:-4]+'-3>' + "\n")
rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()
As you can see, I have the upper-level tags set to be created manually if the row exists. Is there a more efficient way to achieve my goal of creating the <csvFile-*></csvFile-*> tags rather than repeating my code 15+ times? Thanks!
I would use xml.etree.ElementTree or lxml.etree to write the XML. xml.etree.ElementTree is in the standard library, but does not have built-in pretty-printing. (You could use the indent function from here, however).
lxml.etree is a third-party module, but it has built-in pretty-printing in its
tostringmethod.Using lxml.etree, you could do something like this:
yields
Some suggestions:
In Python, almost never do you need to say
Instead say
to loop over all the items in
tags.Also instead of manually counting the times through a loop with
instead use the enumerate function:
Strings like
are incredibly difficult to read. It mixes together logic of
indentation, with the XML syntax of tags, with the minutia of end of
line characters. That’s where
xml.etree.ElementTreeorlxml.etreewill help you. It will take care of the serialization of the XML for
you; all you need to provide is the relationship between the XML elements.
The code will be much more readable and easier to maintain.