I know using a regex to parse html is normally a non-starter but I

Question

0

Editorial Team

Asked: May 28, 20262026-05-28T04:00:16+00:00 2026-05-28T04:00:16+00:00

I know using a regex to parse html is normally a non-starter but I

0

I know using a regex to parse html is normally a non-starter but I don’t want anything that clever…

Taking this example

<div><!--<b>Test</b>-->Test</div>
<div><!--<b>Test2</b>-->Test2</div>

I’d like to strip out ANYTHING that isn’t between  to get:

<b>Test</b><b>Test2</b>

Tags are guaranteed to be correctly matched (no unclosed/nested comments).

What regex do I need to use?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T04:00:16+00:00

Replace the pattern:

(?s)((?!-->).)*<!--|-->((?!<!--).)*

with an empty string.

A short explanation:

(?s)              # enable DOT-ALL
((?!-->).)*<!--   # match anything except '-->' ending with '<!--'
|                 # OR
-->((?!<!--).)*   # match '-->' followed by anything except '<!--'

Be careful when processing (X)HTML with regex. Whenever parts of comments occur in tag-attributes or CDATA blocks, things go wrong.

EDIT

Seeing your most active tag is JavaScript, here’s a JS demo:

print(
  "<div><!--<b>Test</b>-->Test</div>\n<div><!--<b>Test2</b>-->Test2</div>"
  .replace(
    /((?!-->)[\s\S])*<!--|-->((?!<!--)[\s\S])*/g,
    ""
  )
);

which prints:

<b>Test</b><b>Test2</b>

Note that since JS does not support the (?s) flag, I used the equivalent [\s\S] which matches any character (including line break chars).

Test it on Ideone here: http://ideone.com/6yQaK

EDIT II

And a PHP demo would look like:

<?php
$s = "<div><!--<b>Test</b>-->Test</div>\n<div><!--<b>Test2</b>-->Test2</div>";
echo preg_replace('/(?s)((?!-->).)*<!--|-->((?!<!--).)*/', '', $s);
?>

which also prints:

<b>Test</b><b>Test2</b>

as can be seen on Ideone: http://ideone.com/Bm2uJ

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I know using a regex to parse html is normally a non-starter but I

Leave an answerCancel reply

1 Answer

EDIT

EDIT II

Leave an answer
Cancel reply