I’m working on a site that sanitizes output from the database so that some

Question

0

Asked: May 24, 20262026-05-24T22:15:56+00:00 2026-05-24T22:15:56+00:00

I’m working on a site that sanitizes output from the database so that some

0

I’m working on a site that sanitizes output from the database so that some html tags are allowed. It’s using Regex to sanitize the data.

At the moment it allows standard
Google (standard href with no target)
but does not allow

<a href="http://www.google.com" target="_blank" title="Google">Google</a>

The code looks like this at the moment:

private static Regex _tags = new Regex("<[^>]*(>|$)",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled);
private static Regex _whitelist = new Regex(@"
^</?(b(lockquote)?|code|d(d|t|l|el)|em|h(1|2|3)|i|kbd|u|li|ol|p(re)?|s(ub|up|trong|trike)?|ul)>$|
^<(b|h)r\s?/?>$",
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
private static Regex _whitelist_a = new Regex(@"
^<a\s
href=""(\#\d+|(https?|ftp)://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+)""
(\stitle=""[^""<>]+"")?\s?>$|
^</a>$",
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
private static Regex _whitelist_img = new Regex(@"
^<img\s
src=""https?://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+""
(\swidth=""\d{1,3}"")?
(\sheight=""\d{1,3}"")?
(\salt=""[^""<>]*"")?
(\stitle=""[^""<>]*"")?
\s?/?>$",
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);


/// <summary>
/// sanitize any potentially dangerous tags from the provided raw HTML input using 
/// a whitelist based approach, leaving the "safe" HTML tags
/// CODESNIPPET:4100A61A-1711-4366-B0B0-144D1179A937
/// </summary>
public static string Sanitize(string html)
{
    if (String.IsNullOrEmpty(html)) return html;

    string tagname;
    Match tag;

    // match every HTML tag in the input
    MatchCollection tags = _tags.Matches(html);
    for (int i = tags.Count - 1; i > -1; i--)
    {
        tag = tags[i];
        tagname = tag.Value.ToLowerInvariant();

        if (!(_whitelist.IsMatch(tagname) || _whitelist_a.IsMatch(tagname) || _whitelist_img.IsMatch(tagname)))
        {
            html = html.Remove(tag.Index, tag.Length);

        }
    }

    return html;
}

I’d like to allow hrefs with targets aswell.

Any help with this would be great, thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T22:15:57+00:00

Edited to include second request in comment.

Change:

private static Regex _whitelist_a = new Regex(@"
^<a\s
href=""(\#\d+|(https?|ftp)://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+)""
(\stitle=""[^""<>]+"")?\s?>$|
^</a>$",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);

to:

private static Regex _whitelist_a = new Regex(@"
^<a(\starget=""[^""<>]+"")?\s
href=""(\#\d+|(https?|ftp)://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+)""
(\starget=""[^""<>]+"")?(\stitle=""[^""<>]+"")?\s?>$|
^</a>$",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);

It’s not necessarily the perfect solution, but this will allow a “target” both before and after the “href”, or before, or after, or not at all.

You should be able to create a regex that is much more succinct, similar to this:

^<a(\s+(?:target|href|title)="[^"<>]+")*\s*>$|^</a>$

But I don’t know exactly how you would write this in your code, as I’m not familiar with C# or .Net. But you could try the following:

private static Regex _whitelist_a = new Regex(
    @"^<a(\s+(?:target|href|title)=""[^""<>]+"")*\s*>$|^</a>$",
    RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace
);

The advantage of this solution over the above solutions is that it will allow any of href, target and title in any order, and with any number of spaces in between them.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a site that sanitizes output from the database so that some

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply