I’m suppose to capture everything inside a tag and the next lines after it,

Question

0

Asked: May 11, 20262026-05-11T21:29:43+00:00 2026-05-11T21:29:43+00:00

I’m suppose to capture everything inside a tag and the next lines after it,

0

I’m suppose to capture everything inside a tag and the next lines after it, but it’s suppose to stop the next time it meets a bracket. What am i doing wrong?

import re #regex

regex = re.compile(r"""
         ^                    # Must start in a newline first
         \[\b(.*)\b\]         # Get what's enclosed in brackets 
         \n                   # only capture bracket if a newline is next
         (\b(?:.|\s)*(?!\[))  # should read: anyword that doesn't precede a bracket
       """, re.MULTILINE | re.VERBOSE)

haystack = """
[tab1]
this is captured
but this is suppose to be captured too!
@[this should be taken though as this is in the content]

[tab2]
help me
write a better RE
"""
m = regex.findall(haystack)
print m

what im trying to get is:
[(‘tab1’, ‘this is captured\nbut this is suppose to be captured too!\n@[this should be taken though as this is in the content]\n’, ‘[tab2]’,’help me\nwrite a better RE\n’)]

edit:

regex = re.compile(r"""
             ^           # Must start in a newline first
             \[(.*?)\]   # Get what's enclosed in brackets 
             \n          # only capture bracket if a newline is next
             ([^\[]*)    # stop reading at opening bracket
        """, re.MULTILINE | re.VERBOSE)

this seems to work but it’s also trimming the brackets inside the content.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T21:29:43+00:00

Python regex doesn’t support recursion afaik.

EDIT: but in your case this would work:

regex = re.compile(r"""
         ^           # Must start in a newline first
         \[(.*?)\]   # Get what's enclosed in brackets 
         \n          # only capture bracket if a newline is next
         ([^\[]*)    # stop reading at opening bracket
    """, re.MULTILINE | re.VERBOSE)

EDIT 2: yes, it doesn’t work properly.

import re

regex = re.compile(r"""
    (?:^|\n)\[             # tag's opening bracket  
        ([^\]\n]*)         # 1. text between brackets
    \]\n                   # tag's closing bracket
    (.*?)                  # 2. text between the tags
    (?=\n\[[^\]\n]*\]\n|$) # until tag or end of string but don't consume it
    """, re.DOTALL | re.VERBOSE)

haystack = """[tag1]
this is captured [not a tag[
but this is suppose to be captured too!
[another non-tag

[tag2]
help me
write a better RE[[[]
"""

print regex.findall(haystack)

I do agree with viraptor though. Regex are cool but you can’t check your file for errors with them. A hybrid perhaps? 😛

tag_re = re.compile(r'^\[([^\]\n]*)\]$', re.MULTILINE)
tags = list(tag_re.finditer(haystack))

result = {}
for (mo1, mo2) in zip(tags[:-1], tags[1:]):
    result[mo1.group(1)] = haystack[mo1.end(1)+1:mo2.start(1)-1].strip()
result[mo2.group(1)] = haystack[mo2.end(1)+1:].strip()

print result

EDIT 3: That’s because ^ character means negative match only inside [^squarebrackets]. Everywhere else it means string start (or line start with re.MULTILINE). There’s no good way for negative string matching in regex, only character.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m suppose to capture everything inside a tag and the next lines after it,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply