I am trying to make a regex that will just look for and remove script tags(its the only tag I wanted removed since I think it is the only one that can cause damage).
Anyways I know there are so many way to write a script tag that is still valid. Will this catch them?
<\s*script\s*>.*?<\s*\/script\s*>
Edit
or would it better to try to change them all to safe tags? you know where it does html encoding on the tags? But it could only be on script tags since I still want to allow other html tags like <b> and stuff.
In almost all cases where you want to filter this sort of thing, it is better to check for what you specifically want to allow, rather than what you want to disallow. There are a zillion creative ways of hiding a
<script>tag in HTML source, and you don’t want to try to play the race of catching up with the new ones people might invent. On the other hand, you can quite easily create a list of acceptable tags and let people use those.