I’m trying to convert all instances of the > character to its HTML entity

Question

0

Asked: May 11, 20262026-05-11T01:06:29+00:00 2026-05-11T01:06:29+00:00

I’m trying to convert all instances of the > character to its HTML entity

0

I’m trying to convert all instances of the > character to its HTML entity equivalent, >, within a string of HTML that contains HTML tags. The furthest I’ve been able to get with a solution for this is using a regex.

Here’s what I have so far:

        public static readonly Regex HtmlAngleBracketNotPartOfTag = new Regex('(?:<[^>]*(?:>|$))(>)', RegexOptions.Compiled | RegexOptions.Singleline);

The main issue I’m having is isolating the single > characters that are not part of an HTML tag. I don’t want to convert any existing tags, because I need to preserve the HTML for rendering. If I don’t convert the > characters, I get malformed HTML, which causes rendering issues in the browser.

This is an example of a test string to parse:

'Ok, now I've got the correct setting.<br/><br/>On 12/22/2008 3:45 PM, jproot@somedomain.com wrote:<br/><div class'quotedReply'>> Ok, got it, hope the angle bracket quotes are there.<br/>><br/>> On 12/22/2008 3:45 PM, > sbartfast@somedomain.com wrote:<br/>>> Please someone, reply to this.<br/>>><br/>><br/></div>'

In the above string, none of the > characters that are part of HTML tags should be converted to >. So, this:

<div class'quotedReply'>>

should become this:

<div class'quotedReply'>&gt;

Another issue is that the expression above uses a non-capturing group, which is fine except for the fact that the match is in group 1. I’m not quite sure how to do a replace only on group 1 and preserve the rest of the match. It appears that a MatchEvaluator doesn’t really do the trick, or perhaps I just can’t envision it right now.

I suspect my regex could do with some lovin’.

Anyone have any bright ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T01:06:30+00:00

The trick is to capture everything that isn’t the target, then plug it back in along with the changed text, like this:

Regex.Replace(str, @'\G((?>[^<>]+|<[^>]*>)*)>', '$1&gt;');

But Anthony’s right: right angle brackets in text nodes shouldn’t cause any problems. And matching HTML with regexes is tricky; for example, comments and CDATA can contain practically anything, so a robust regex would have to match them specifically.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to convert all instances of the > character to its HTML entity

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply