haystack: <h2 >a  · · · </h2> <div class=indent> aaaa </div> <h2 >b  · · · </h2> <div

Question

0

Asked: June 5, 20262026-06-05T15:37:18+00:00 2026-06-05T15:37:18+00:00

haystack: <h2 >a  · · · </h2> <div class=indent> aaaa </div> <h2 >b  · · · </h2> <div

0

haystack:

<h2 >a&nbsp; &middot;&nbsp;&middot;&nbsp;&middot;
</h2>
<div class="indent">
aaaa
</div>
<h2 >b&nbsp; &middot;&nbsp;&middot;&nbsp;&middot;
</h2>
<div class="indent">
bbbb
</div>

pattern I used:

#<h2[^>]*>(a|b)(?!</h2>)[\s\S]*</h2><div class="indent">((?!</div>)[\s\S]+)</div>#

this pattern only matches the first h2 content(e.g. a  · · ·) and the content in last div(e.g. bbbb)

but I whan it to match all content in the h2 and div to make an one to one map(e.g. a  · · ·=>aaaa,b  · · ·=>bbbb), how do I do this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T15:37:20+00:00

[\s\S]* and [\s\S]+ are greedy, meaning they will match as many characters as possible. Try changing them to [\s\S]*? and [\s\S]+?.

With your current regex, if you were to put your [\s\S]* into a capturing group you would see that it matches the following:

&nbsp; &middot;&nbsp;&middot;&nbsp;&middot;
</h2>
<div class="indent">
aaaa
</div>
<h2 >b&nbsp; &middot;&nbsp;&middot;&nbsp;&middot;

Adding the ? at the end makes this lazy, so instead of matching as much as possible it will match as few characters as possible, so it will stop at the first </h2> like you want. The same reasoning applies to the [\s\S]+ later in your regex.

It also looks like this should fail on your sample string because you have </h2><div... in the middle of your regex, but in your sample text there is always a newline between the closing </h2> and the <div>, you should probably change this section to </h2>\s*<div.... End result:

#<h2[^>]*>(a|b)(?!</h2>)[\s\S]*?</h2>\s*<div class="indent">((?!</div>)[\s\S]+?)</div>#

But don’t parse HTML with regex!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

haystack: <h2 >a&nbsp; &middot;&nbsp;&middot;&nbsp;&middot; </h2> <div class=indent> aaaa </div> <h2 >b&nbsp; &middot;&nbsp;&middot;&nbsp;&middot; </h2> <div

Leave an answerCancel reply

1 Answer

haystack: <h2 >a · · · </h2> <div class=indent> aaaa </div> <h2 >b · · · </h2> <div

Leave an answer
Cancel reply