My site stores HTML that is generated by the user. Then of course this data is rendered on a web page. What are the best practices for rending the HTML and avoiding XSS attacks? Is stripping <script> and <iframe> tags enough? Will this cover all browsers? I heard of old browsers rendering HTML from weird encoding… how can I handle this?
I would like a general answer, not related to any languages or technologies.
You could use libraries like Jsoup especially their whitelist-sanitizer to prevent XSS.
In general, I think it is a better/safer aproach to use a white list, rather than filtering black listed tags. Besides, HTML should be avoided in the first place. Instead, some simple markup, like markdown, should be used.