I am trying to parse some fairly flat HTML and group everything from one

Question

0

Asked: May 26, 20262026-05-26T21:01:17+00:00 2026-05-26T21:01:17+00:00

I am trying to parse some fairly flat HTML and group everything from one

0

I am trying to parse some fairly flat HTML and group everything from one h1 tag to the next. For example, I have the following HTML:

<h1> Heading 1 </h1>
<p> Paragraph 1.1 </p>
<p> Paragraph 1.2 </p>
<p> Paragraph 1.3 </p>
<h1> Heading 2 </h1>
<p> Paragraph 2.1 </p>
<p> Paragraph 2.2 </p>
<h1> Heading 3 </h1>
<p> Paragraph 3.1 </p>
<p> Paragraph 3.2 </p>
<p> Paragraph 3.3 </p>

I basically want it to look like:

<div id='1'>
    <h1> Heading 1 </h1>
    <p> Paragraph 1.1 </p>
    <p> Paragraph 1.2 </p>
    <p> Paragraph 1.3 </p>
</div>
<div id='2'>
    <h1> Heading 2 </h1>
    <p> Paragraph 2.1 </p>
    <p> Paragraph 2.2 </p>
</div>
<div id='3'>
    <h1> Heading 3 </h1>
    <p> Paragraph 3.1 </p>
    <p> Paragraph 3.2 </p>
    <p> Paragraph 3.3 </p>
</div>

It is probably not even worth be posting the code I have done so far, as it just turned into a mess. Basically I was attempting to do an Xpath query for ‘//h1’. Create new DIV tags as parent nodes. Then copy the h1 DOM Node into the first DIV, and then loop over nextSibling until I hit another h1 tag – as mentioned it got messy.

Could someone point me in a better direction here?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T21:01:18+00:00

Iterate over all nodes that are on the same level (I created a hint node called platau in my example), whenever your run across <h1>, insert the div before and keep a reference to it.

For <h1> and any other node and if the reference exists, remove the node and add it as child to the reference.

Example:

$doc->loadXML($xml);
$xp = new DOMXPath($doc);

$current = NULL;
$id = 0;
foreach($xp->query('/platau/node()') as $i => $sort)
{
    if (isset($sort->tagName) && $sort->tagName === 'h1')
    {
        $current = $doc->createElement('div');
        $current->setAttribute('id', ++$id);
        $current = $sort->parentNode->insertBefore($current, $sort);
    }
    if (!$current) continue;

    $sort->parentNode->removeChild($sort);
    $current->appendChild($sort);
}

Demo

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse some fairly flat HTML and group everything from one

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply