I know using a regex to parse html is normally a non-starter but I don’t want anything that clever…
Taking this example
<div><!--<b>Test</b>-->Test</div>
<div><!--<b>Test2</b>-->Test2</div>
I’d like to strip out ANYTHING that isn’t between <!-- and --> to get:
<b>Test</b><b>Test2</b>
Tags are guaranteed to be correctly matched (no unclosed/nested comments).
What regex do I need to use?
Replace the pattern:
with an empty string.
A short explanation:
Be careful when processing (X)HTML with regex. Whenever parts of comments occur in tag-attributes or CDATA blocks, things go wrong.
EDIT
Seeing your most active tag is JavaScript, here’s a JS demo:
which prints:
Note that since JS does not support the
(?s)flag, I used the equivalent[\s\S]which matches any character (including line break chars).Test it on Ideone here: http://ideone.com/6yQaK
EDIT II
And a PHP demo would look like:
which also prints:
as can be seen on Ideone: http://ideone.com/Bm2uJ