When displaying the body of some articles in my site in an rss reader, they are full width. I wish to add a width attribute to all images when used in the rss feed, preferably using a filter since I’m using a template to arrange the body among some other elements.
I wrote the following method, as a test:
try:
_parser = minidom.Text()
_parser.data = obj.body
_xml = _parser.toxml(encoding='UTF-8')
_return = minidom.parseString(_xml)
_images = _return.getElementsByTagName('img')
print "============= This is what I found: ============="
#print _images
except ExpatError as (e):
print "============= This is what I found: ============="
print ErrorString(e.code)
But the output looks like this:
============= This is what I found: =============
syntax error
============= This is what I found: =============
not well-formed (invalid token)
============= This is what I found: =============
syntax error
============= This is what I found: =============
syntax error
============= This is what I found: =============
syntax error
============= This is what I found: =============
syntax error
(and so on, there are no working cases)
So maybe my method is wrong completely, hope someone can help me.
I don’t think you can parse all valid HTML with XML parser.
Please look at python html parsing for various ways of parsing html.