I have a text string structured like this:
= Some Heading (1) Some text == Some Sub-Heading (2) Some more text === Some Sub-sub-heading (3) Some details here = Some other Heading (4)
I want to extract the content of second heading, including any subsection. I do not know beforehand what is the depth of the second heading, so I need to match from there to the next heading that is of the same depth, or shallower, or the end of the string.
In the example above, this would yield:
== Some Sub-Heading (2) Some more text === Some Sub-sub-heading (3) Some details here
This is where I get stuck. How can I use the matched sub-expression opening the second heading as part of the sub-expression for closing the section.
I’d skip trying to use a complex regex. Instead write a simple parser and build up a tree.
Here’s a rough and ready implementation. It’s only optimized for lazy coding. You may want to use libraries from CPAN to build your parser and your tree nodes.