I am using umbraco where the validation on fields is done by regular expressions. In one field I want to allow users to style their text using the rich text editor (tinymce) but I still want to limit the number of characters they can enter.
I’m currently using this regular expression but it checks the total number of characters so includes the html.
^[\s\S]{0,250}$
Is there a regular expression that wouldn’t count the characters in html tags.
The short answer is no. At least, not with any sane regex, not without an advanced regex engine that allows recursion or balanced groups, and maybe not at all. A regex that can recognize and ignore HTML tags would have to parse the HTML to do it, and down that road lies madness.
However, you could use some sort of preprocessing, such as jQuery on the client-side or something else on the server-side, to parse the HTML and strip out the tags before you apply length validation.
Are you sure you want to do this, though? If you’re storing the styled input in a database, then those HTML tags are going to count against your column size just like everything else will. If you’re storing these in a varchar(250) column, you’re going to have to either count the HTML tags as part of that 250, or else strip them out and lose all the style information.