I am trying to parse some fairly flat HTML and group everything from one h1 tag to the next. For example, I have the following HTML:
<h1> Heading 1 </h1>
<p> Paragraph 1.1 </p>
<p> Paragraph 1.2 </p>
<p> Paragraph 1.3 </p>
<h1> Heading 2 </h1>
<p> Paragraph 2.1 </p>
<p> Paragraph 2.2 </p>
<h1> Heading 3 </h1>
<p> Paragraph 3.1 </p>
<p> Paragraph 3.2 </p>
<p> Paragraph 3.3 </p>
I basically want it to look like:
<div id='1'>
<h1> Heading 1 </h1>
<p> Paragraph 1.1 </p>
<p> Paragraph 1.2 </p>
<p> Paragraph 1.3 </p>
</div>
<div id='2'>
<h1> Heading 2 </h1>
<p> Paragraph 2.1 </p>
<p> Paragraph 2.2 </p>
</div>
<div id='3'>
<h1> Heading 3 </h1>
<p> Paragraph 3.1 </p>
<p> Paragraph 3.2 </p>
<p> Paragraph 3.3 </p>
</div>
It is probably not even worth be posting the code I have done so far, as it just turned into a mess. Basically I was attempting to do an Xpath query for ‘//h1’. Create new DIV tags as parent nodes. Then copy the h1 DOM Node into the first DIV, and then loop over nextSibling until I hit another h1 tag – as mentioned it got messy.
Could someone point me in a better direction here?
Iterate over all nodes that are on the same level (I created a hint node called platau in my example), whenever your run across
<h1>, insert the div before and keep a reference to it.For
<h1>and any other node and if the reference exists, remove the node and add it as child to the reference.Example:
Demo