XML keeps throwing me curve balls. I am having a hard time finding a manual I can understand. So I apologize for all the questions in the past couple of days.
In any case, I have the following XML:
<clade>
<clade>
<branch_length>0.5</branch_length>
<clade>
<name>MnPV1</name>
<annotation>
<desc>Iotapapillomavirus 1</desc></annotation><chart><group>Iota</group></chart><branch_length>1.0</branch_length>
</clade>
<clade>
I would like to change this to:
<clade>
<clade>
<branch_length>0.5</branch_length>
<clade>
<name bgstyle="green">MnPV1</name>
<annotation><desc>Iotapapillomavirus 1</desc><uri>http://pave.niaid.nih.gov/#fetch?id=MnPV1REF&format=Locus%20view&hasStructure=none</uri></annotation><chart><group>Iota</group></chart><branch_length>1.0</branch_length>
</clade>
<clade>
So I want to change:
<name>MnPV1</name>
to:
<name bgstyle="green">MnPV1</name>
The catch is, that I am looking for whether :
tree.xpath('//phylo:group[text()="Iota"]
If it is I would like to get the “uncle” of “group” node so I can edit the “name” node
This is what I came up with so far:
tree = lxml.etree.XML(data)
nsmap = {'phylo': 'http://www.phyloxml.org'}
matches = tree.xpath('//phylo:group[text()="Iota"]', namespaces=nsmap)
for e in matches:
uncle=e.getparent().getsibling() #however, getsibling() does not exist...
I would appreciate any help (and/or recommendations for lxml for dummies).
How about this?
The trick is to use XML tools (e.g., XPath and XSLT) to manipulate XML documents. The w3schools sites are pretty good starting points. XPath is fairly powerful in its own right and is quite readable once you get the hang of it. This type of problem is best solved using XSLT though. If you are going to be manipulating a bunch of XML, do yourself a huge favor and purchase a copy of the Oxygen XML editor or something similar.
If you are looking for something using less XPath and more Python, then use the
getparentfollowed by calls togetprevious. I’m not sure how well supportedgetparentandgetpreviousare, but they are documented and do work.