Possible Duplicate:
Matching Nested Structures With Regular Expressions in Python
I can’t wrap my head around this problem. I have a string like the following one:
Lorem ipsum dolor sit amet [@a xxx yyy [@b xxx yyy [@c xxx yyy]]] lorem ipsum sit amet
My task would be to extract the commands (they are always starting with [@ and ending with ]) and their subcommands. A result like
[
[@a xxx yyy [@b xxx yyy [@c xxx yyy]]], # the most outer
[@b xxx yyy [@c xxx yyy]], # the middle one
[@c xxx yyy] # the inner most
]
would be highly appreciated. The problem is that these kind of commands can occur in very long text messages, so a “performant” solution would be nice.
I was toying around with some regex patterns mostly of the time something like
(\[@.*?\]\s) # for the outer one
but i have seen no light in matching the middle and inner one. To make it more complicated, the amount of nested commands is variable…
Might some special regex be the solution? I have read about lookaheads and lookbehinds but no idea how to use them in this special case.
Thank a bunch!
UPDATE
@Cyborgx37 pointed me to another post that uses the pyparsing package. It would be nice to have a solution without an external package or library. But pyparsing definately solves that problem!
C# has recursive/nested RegEx, I don’t believe Python does. You could re-run the RegEx search on previous results, but this is probably less efficient (the overhead of RegEx for such a simple search) than just making a custom parser. The text your searching for “[@” and “]” isn’t very complex.
Here’s a custom parser (in JavaScript) that would do the job.
It quickly loops through all the characters of the text (only once) and uses a stack and an if…if else…else condition to push, pop and modify the values in that stack respectively.