Edit: To be clear, please understand that I am not using Regex to parse

Question

0

Asked: May 14, 20262026-05-14T19:41:03+00:00 2026-05-14T19:41:03+00:00

Edit: To be clear, please understand that I am not using Regex to parse

0

Edit: To be clear, please understand that I am not using Regex to parse the html, that’s crazy talk! I’m simply wanting to clean up a messy string of html so it will parse

Edit #2: I should also point out that the control character I’m using is a special unicode character – it’s not something that would ever be used in a proper tag under any normal circumstances

Suppose I have a string of html that contains a bunch of control characters and I want to remove the control characters from inside tags only, leaving the characters outside the tags alone.

For example

Here the control character is the numeral “1”.

Input

The quick 1<strong>orange</strong> lemming <sp11a1n 1class1='jumpe111r'11>jumps over</span> 1the idle 1frog

Desired Output

The quick 1<strong>orange</strong> lemming <span class='jumper'>jumps over</span> 1the idle 1frog

So far I can match tags which contain the control character but I can’t remove them in one regex. I guess I could perform another regex on my matches, but I’d really like to know if there’s a better way.

My regex

Bear in mind this one only matches tags which contain the control character.

<(([^>])*?`([^>])*?)*?>

Thanks very much for your time and consideration.

Iain Fraser

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T19:41:03+00:00

Regex isn’t the tool for this, but you can use lookbehind and lookahead to match 1 in a tag. Here it is in Java, modified to have finite lookbehind (since Java doesn’t support infinite length lookbehind).

    String s = "123 <o123o></o1o1> <oo 11='11x'> x11 <msg136='I <3 Johnny!11'>";
    System.out.println(
        s.replaceAll("(?<=<[^<>]{0,999})(?=[^<>]+>)1", "")
    ); // prints "123 <o23o></oo> <oo ='x'> x11 <msg136='I <3 Johnny!'>

There are many cases where this will fail, but it should get you started somewhere.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Edit: To be clear, please understand that I am not using Regex to parse

For example

Input

Desired Output

My regex

Leave an answerCancel reply

1 Answer

See also

Leave an answer
Cancel reply