I’m learning lxml (after using ElementTree) and I’m baffled why .fromstring and .tostring do not appear to be reversible. Here’s my example:
import lxml.etree as ET
f = open('somefile.xml','r')
data = f.read()
tree_in = ET.fromstring(data)
tree_out = ET.tostring(tree_in)
f2 = open('samefile.xml','w')
f2.write(tree_out)
f2.close
‘somefile.xml’ was 132 KB.
‘samefile.xml’ – the output – was 113 KB, and it is missing the end of the file at some arbirtrary point. The closing tags of the overall tree and a few of the pieces of the final element are just gone.
Is there something wrong with my code, or must there be something wrong with the nesting in the original XML file? If so, am I forced to use BeautifulSoup of ElementTree again (without xpath)?
One note: The text inside many elements had a bunch of crap that was converted to text, but is that what’s causing this problem?
Example:
<QuestionIndex Id="Perm"><Answer><![CDATA[confirm]]></Answer><Answer><![CDATA[NotConfirm]]></Answer></QuestionIndex>
<QuestionIndex Id="Actor"><Answer><![CDATA[GirlLt16]]></Answer><Answer><![CDATA[Fem17to25]]></Answer><Answer><![CDATA[BoyLt16]]></Answer><Answer><![CDATA[Mal17to25]]></Answer><Answer><![CDATA[Moth]]></Answer><Answer><![CDATA[Fath]]></Answer><Answer><![CDATA[Elder]]></Answer><Answer><![CDATA[RelLead]]></Answer><Answer><![CDATA[Auth]]></Answer><Answer><![CDATA[Teach]]></Answer><Answer><![CDATA[Oth]]></Answer></QuestionIndex>
This problem turned out to be way simpler than it appears, and the answer is hidden in the code I provided.
should have been
The difference is the remaining buffer of a few dozen characters that never made it into the notepad++ file I was checking results in. Closing the file for real made all the difference, and the code works.