I am working on a comment system in Codeigniter and would appreciate some advice on what kind of validation rules that I should employ. I don’t want to allow any images or other any HTML.
So far I just have trim and max_length set. I also run the content through htmlspecialchars before I insert in the database. I have XSS filtering enabled globally.
What other precautions should I take? Is htmlspecialchars enough for preventing Javascript or other malicious code from being entered?
You should probably do a regular form validation on required and max_length, and obviously xss filtering before pushing things to the database. The htmlspecialchars should only be applied to characters that aren’t in tags, so you can’t just do htmlspecialchars directly. You need to:
1 – strip the tag elements (and store them) like “
<br/>” or “<b>“, but not their content, that means nothing inside the “<b>” and “</b>“. You can probably do this with a preg_match.2 – execute htmlentities on all the remaining text
3 – remove all unwanted explicit tags (from the stored bunch of tags)
4 – then filter the allowed tags for attributes and content. It’s not uncommon for hackers to use code like
To fix this, either you’ll have to do a little bit of extra work and probably work with some regex-es. If you want me to write some more sample code let me know.
6 – re-add the tag elements back to the document.
I just basically cooked this up right now. The algorithm can be improved in efficiency (i.e. strip the unwanted tags first, and then proceed with filtering html entities and tag contents) but I’ll leave that up to you.
This is as far as I can see the potential hacks right now. There might be other ways to hack your input though, so you might want to check what other comment box systems out there use for their validation, such as the phpbb forum system. Another option might be to use the phpbb square-bracket format to deal with tags so you don’t let users input ANY html tags whatsoever, but instead use square-bracket tags that you control.
Does this answer your question ?