I would like to include the encoding tag in an XML document using BeautifulSoup.BeautifulStoneSoup, but I’m not sure how!
<?xml version="1.0" encoding="UTF-8"?>
<mytag>stuff</mytag>
It outputs the encoding tag when I read a document that already has it, but I’m making a new soup.
Thanks!
Edit: I’ll give an example of what I’m currently doing.
from BeautifulSoup import BeautifulStoneSoup, Tag
soup = BeautifulStoneSoup()
mytag = Tag(soup, 'mytag')
soup.append(mytag)
str(soup)
# '<mytag></mytag>'
soup.prettify() # No encoding given
# '<mytag>\n</mytag>'
soup.prettify(encoding='UTF-8')
# '<mytag>\n</mytag>' # Where's the encoding?
Even if I create the soup like BeautifulStoneSoup(fromEncoding='UTF-8'), there is still no <?xml?> tag.
Is there another way to get that tag without creating and passing the tag as a string directly, or is that the only way?
Do you mean something like this?
Or,
From the BeautifulSoup documentation:
N.B. item #2, which I read as: BeautifulSoup will use the encoding in the xml declaration automatically, if you don’t explicitly specify one with the fromEncoding argument. YMMV.
There are other, potentially useful, unicode related examples in the earlier referenced documentation, as well.
Edit: @TorelTwiddler, if there is another way to add an xml declaration using BeautifulSoup without passing the tag as a string directly, I am not aware of it.
That said, consider the following:
Perhaps that’ll help you get where you want to go.