I have an XML document which reads like this:
<xml>
<web:Web>
<web:Total>4000</web:Total>
<web:Offset>0</web:Offset>
</web:Web>
</xml>
my question is how do I access them using a library like BeautifulSoup in python?
xmlDom.web[“Web”].Total ? does not work?
BeautifulSoup isn’t a DOM library per se (it doesn’t implement the DOM APIs). To make matters more complicated, you’re using namespaces in that xml fragment. To parse that specific piece of XML, you’d use BeautifulSoup as follows:
If you weren’t using namespaces, the code could look like this:
The key here is that BeautifulSoup doesn’t know (or care) anything about namespaces. Thus
web:Webis treated like aweb:webtag instead of as aWebtag belonging to th ewebnamespace. While BeautifulSoup addsweb:webto the xml element dictionary, python syntax doesn’t recognizeweb:webas a single identifier.You can learn more about it by reading the documentation.