I’m using an HTML sanitizing whitelist code found here:
http://refactormycode.com/codes/333-sanitize-html
I needed to add the ‘font’ tag as an additional tag to match, so I tried adding this condition after the <img tag check
if (tagname.StartsWith('<font')) { // detailed <font> tag checking // Non-escaped expression (for testing in a Regex editor app) // ^<font(\s*size='\d{1}')?(\s*color='((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)')?(\s*face='(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)')?\s*?>$ if (!IsMatch(tagname, @'<font (\s*size=''\d{1}'')? (\s*color=''((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)'')? (\s*face=''(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)'')? \s*?>')) { html = html.Remove(tag.Index, tag.Length); } }
Aside from the condition above, my code is almost identical to the code in the page I linked to. When I try to test this in C#, it throws an exception saying ‘Not enough )'s‘. I’ve counted the parenthesis several times and I’ve run the expression through a few online Javascript-based regex testers and none of them seem to tell me of any problems.
Am I missing something in my Regex that is causing a parenthesis to escape? What do I need to do to fix this?
UPDATE
After a lot of trial and error, I remembered that the # sign is a comment in regexes. The key to fixing this is to escape the # character. In case anyone else comes across the same problem, I’ve included my fix (just escaping the # sign)
if (tagname.StartsWith('<font')) { // detailed <font> tag checking // Non-escaped expression (for testing in a Regex editor app) // ^<font(\s*size='\d{1}')?(\s*color='((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)')?(\s*face='(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)')?\s*?>$ if (!IsMatch(tagname, @'<font (\s*size=''\d{1}'')? (\s*color=''((\#[0-9a-f]{6})|(\#[0-9a-f]{3})|red|green|blue|black|white)'')? (\s*face=''(Arial|Courier\sNew|Garamond|Georgia|Tahoma|Verdana)'')? \s*?>')) { html = html.Remove(tag.Index, tag.Length); } }
Your IsMatch Method is using the option
RegexOptions.IgnorePatternWhitespace, that allows you to put comments inside the regular expressions, so you have to scape the # chatacter, otherwise it will be interpreted as a comment.