I want to write an application that consumes RSS. I want to be able to show some items in the item description of the RSS feed as HTML, such as images, links, br, etc. However, I don’t want any embedded scripts to run, unruly css elements, etc. I don’t want to re-invent the wheel either. Are their any libraries that strip out just the correct level of HTML?
The issue that I am running into is that I’m generating an RSS feed from phpBB, so the posts do have br and a (link) tags already. However, a user can paste a script tag in a post and it will be encoded properly to display as text on the page.
However, when I look at the post in an RSS reader, all html in the post is encoded as < and >…etc. This blurs the distinction between the br tag and the (less than)script(greaterthan) tag as they both appear with & l t ; and & g t ;
I feel like this should be easier, and I’m just missing something obvious…I hope.
I figured it out. I was using a RSS script that was causing the html-encoded angle brackets to be ‘mixed in’ with the real html in the rss feed
This is waht the source looked like in phpBB:
But in my rss feed, it was being generated as: (notice no distinction between escaped html and the non-escaped html)
I made a change to the rss.php file so it turned it into this:
That way it was displayed in the RSS feed properly.
Thanks!