Lets assume we have a user form that generates HTML input, and the following could be an example of what gets POSTed to PHP.
<p>Hello</p>
<p><strong>World</strong></p>
Now, these will show up later on via injected to the HTML output, into some DIV.
What I’d like to prevent is the following being entered in:
</div>
<p>Hello</p>
<p><strong>World</strong></p>
<div>
Or even something like:
</div>
<script> someScript(); </script>
<iframe src="http://www.example.com">......
<p>Hello</p>
<p><strong>World</strong></p>
<div>
How can I use PHP to determine that this input will not break the document, include bad iframes, or run scripts? The most importat part is I still want that information, I’m not throwing it out, but it needs to be included as harmless text of some sort.
Using alternative markup is not an option, it needs to be HTML.
what you need is htmlpurifier
Not only it outputs html according to standars but it cleans the posted code from xss vulnerabilities.
Edit 1: you should also check the comparison out , its interesting:)
Edit 2: you can also check out htmlspecialchars and htmlentities
but imo htmlpurifier is far better and much more customizable, when it comes to more complex things, like yours.