Possible Duplicate: Matching Nested Structures With Regular Expressions in Python I can’t wrap my

Question

0

Editorial Team

Asked: June 18, 20262026-06-18T07:44:21+00:00 2026-06-18T07:44:21+00:00

Possible Duplicate: Matching Nested Structures With Regular Expressions in Python I can’t wrap my

0

Possible Duplicate:
Matching Nested Structures With Regular Expressions in Python

I can’t wrap my head around this problem. I have a string like the following one:

Lorem ipsum dolor sit amet [@a xxx yyy [@b xxx yyy [@c xxx yyy]]] lorem ipsum sit amet

My task would be to extract the commands (they are always starting with [@ and ending with ]) and their subcommands. A result like

[
    [@a xxx yyy [@b xxx yyy [@c xxx yyy]]], # the most outer
    [@b xxx yyy [@c xxx yyy]],              # the middle one
    [@c xxx yyy]                            # the inner most
]

would be highly appreciated. The problem is that these kind of commands can occur in very long text messages, so a “performant” solution would be nice.

I was toying around with some regex patterns mostly of the time something like

(\[@.*?\]\s) # for the outer one

but i have seen no light in matching the middle and inner one. To make it more complicated, the amount of nested commands is variable…
Might some special regex be the solution? I have read about lookaheads and lookbehinds but no idea how to use them in this special case.

Thank a bunch!

UPDATE

@Cyborgx37 pointed me to another post that uses the pyparsing package. It would be nice to have a solution without an external package or library. But pyparsing definately solves that problem!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T07:44:23+00:00

C# has recursive/nested RegEx, I don’t believe Python does. You could re-run the RegEx search on previous results, but this is probably less efficient (the overhead of RegEx for such a simple search) than just making a custom parser. The text your searching for “[@” and “]” isn’t very complex.

Here’s a custom parser (in JavaScript) that would do the job.

var txt = "Lorem ipsum dolor sit amet [@a xxx yyy [@b xxx yyy [@c xxx yyy]]] lorem ipsum sit amet";
function parse(s) {
    var stack = [];
    var result = [];
    for(var x=0; x<s.length; x++) {
        var c = s.charAt(x);
        if(c == '[' && x+1 < s.length-1 && s.charAt(x+1) == '@') {
            for(var y=0; y<stack.length; y++)
                stack[y] += "[@";
            stack.push("[@");
            x++;
        } else if(c == ']' && stack.length > 0) {
            for(var y=0; y<stack.length; y++)
                stack[y] += "]";
            result.push(stack.pop());
        } else {
            for(var y=0; y<stack.length; y++)
                stack[y] += c;
        }
    }
    return result;
}
parse(txt);

It quickly loops through all the characters of the text (only once) and uses a stack and an if…if else…else condition to push, pop and modify the values in that stack respectively.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Possible Duplicate: Matching Nested Structures With Regular Expressions in Python I can’t wrap my

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply