I’m trying to come up with a validation expression to prevent users from entering html or javascript tags into a comment box on a web page.
The following works fine for a single line of text:
^(?!.*(<|>)).*$
..but it won’t allow any newline characters because of the dot(.). If I go with something like this:
^(?!.*(<|>))(.|\s)*$
it will allow multiple lines but the expression only matches ‘<‘ and ‘>’ on the first line. I need it to match any line.
This works fine:
^[-_\s\d\w"'\.,:;#/&\$\%\?!@\+\*\\(\)]{0,4000}$
but it’s ugly and I’m concerned that it’s going to break for some users because it’s a multi-lingual application.
Any ideas? Thanks!
Note that your RE prevents users from entering
<and>, in any context. “2 > 1”, for example. This is very undesirable.Rather than trying to use regular expressions to match HTML (which they aren’t well suited to do), simply escape
<and>by transforming them to<and>. Alternatively, find a package for your language-of-choice that implements whitelisting to allow a limited subset of HTML, or that supports its own markup language (I hear markdown is nice).As for “.” not matching newline characters, some regexp implementations support a flag (usually “m” for “multi-line” and “s” for “single line”; the latter causes “.” to match newlines) to control this behavior.
The first two are basically equivalent to
/^[^<>]*$/, except this one works on multiline strings. Any reason why you didn’t write the RE that way?