I am using lxml to parse xml I got from amazon product advertisement API.
The tree is parsed as:
root=etree.XML(self.sendRequest(parameters))
When I use root.tag or root.child.tag, I always get something like:
{http://webservices.amazon.com/AWSECommerceService/2005-10-05}RequestProcessingTime
The link appears in the tag name is actually an attribute of root element:
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2005-10-05">
However, it doesn’t seem to be correctly parsed.
Is there a way I can remove the annoying {…} from the tags?
The part between the braces is the XML namespace, which is read from the
xmlnsattribute of the element. You can’t get rid of it, because this is just how the element tree API which lxml is based on is defined: All tag names are prefixed with their namespace in curly braces.Some notion of namespacing is mandatory for a well-behaved XML parser to resolve ambiguities, which arise because the same tag name can appear in different namespaces with different meanings, and a single document can contain tags from multiple namespaces.
Your document is parsed correctly, you simply have to consider the namespace in your program. That’s it.