The article segmentation have two kinds of cases: 1. < p > the first

Question

0

Asked: June 8, 20262026-06-08T06:08:51+00:00 2026-06-08T06:08:51+00:00

The article segmentation have two kinds of cases: 1. < p > the first

0

The article segmentation have two kinds of cases:

 1. < p > the first paragraph < / p > < p > the second paragraph < / p >...
 2. < p > the first period of < br / > < br / > the second paragraph < br / > < br / > the third paragraph < / p >

I write the code as follows:

$body_arr = preg_split('/\<\/?p\>/',$body,-1,PREG_SPLIT_NO_EMPTY);
echo count($body_arr);
    if(count($body_arr)<4) 
    {
       $body_arr = preg_split('/(\<br\/?\>)\s*\\1/',$body,-1,PREG_SPLIT_NO_EMPTY);
       $body1 = $body2 = $body3 = '';
       $total = count($body_arr);
       $maxed = max(floor($total / 2), 3);
       foreach ($body_arr as $k => $v) 
       {
            if ($k == 0) 
            {
                $body1 = $v . "<br><br>";
            } 
            else if ($k < $maxed) 
            {
                $body2.=$v . "<br><br>";
            } 
            else 
            {
                $body3.=$v . "<br><br>"  ;
            }
       }
     }

It is the second
The result is wrong.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T06:08:54+00:00

You can split the text with a single regex using nested groups. You’re starting with a p tag, followed by multiple paragraphs that end in either another close/open p tag, a pair of br tags, or a final close p tag.

The close/open p tag can be represented with the following:

<\s*//*\s*p\s*>[\s|\r|\n]*<\s*p\s*>

The double br tag can be represented with the following:

<\s*br\s*//*\s*>[\s|\r|\n]*<\s*br\s*//*\s*>

And the close p tag can be represented with the following:

<\s*//*\s*p\s*>

Note that I’m allowing for space between tags because you had it in your example, but remove the \s* if they’re not necessary. Stitch that together using some nested groups and you end up with something like this:

<\s*p\s*>((?<Paragraph>[^<]*)((<\s*//*\s*p\s*>[\s|\r|\n]*<\s*p\s*>)|(<\s*br\s*//*\s*>[\s|\r|\n]*<\s*br\s*//*\s*>)|(<\s*//*\s*p\s*>)))*

I tested that with your examples and it works. From the example I’m assuming that you don’t have tags in the middle of the paragraphs, but you’ll have to use something fancier than not the start of a tag to capture the actual text if that isn’t the case.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The article segmentation have two kinds of cases: 1. < p > the first

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply