I need a way to find only rendered IMG tags in a HTML snippet. So, I can’t just regex the HTML snippet to find all IMG tags because I’d also get IMG tags that are shown as text in the HTML (not rendered).
I’m using Python on AppEngine.
Any ideas?
Thanks, Ivan
Sounds like a job for BeautifulSoup:
As you can see, BeautifulSoup is smart enough to ignore comments and displayed HTML.
EDIT: I’m not sure what you mean by the RSS feed escaping ALL images, though. I wouldn’t expect BeautifulSoup to figure out which are meant to be shown if they are all escaped. Can you clarify?