I’ve seen this question asked a few times on stackoverflow, with no resoundingly wonderful answer.
The answer always seems to be “don’t use regex,” without any examples of a better alternative.
For my purposes this will not be done for validation, but after the fact stripping.
I need to strip out all script tags including any content that may be between them.
Any suggestions on the best REGEX way to do this?
EDIT: PREEMPTIVE RESPONSE: I can’t use HTML Purifier nor the DOMXPath feature of PHP.
The reason REGEX for HTML is considered evil, is because it can (usually) easily be broken, forcing you to repeatedly rethink your pattern. If for instance you’re matching
It could be broken easily with
If you use
It can also be easily broken with
There’s no end for this. If you can’t use any of the methods you’ve stated, you could try
strip_tags, but it takes a whitelist as a parameter, not a blacklist, meaning you’ll need to manually allow every single tag you want to allow.If all else fail, you could resort to RegEx, what I came up with is this
But I bet someone around here could probably come and break that too.