I’m suppose to capture everything inside a tag and the next lines after it, but it’s suppose to stop the next time it meets a bracket. What am i doing wrong?
import re #regex
regex = re.compile(r"""
^ # Must start in a newline first
\[\b(.*)\b\] # Get what's enclosed in brackets
\n # only capture bracket if a newline is next
(\b(?:.|\s)*(?!\[)) # should read: anyword that doesn't precede a bracket
""", re.MULTILINE | re.VERBOSE)
haystack = """
[tab1]
this is captured
but this is suppose to be captured too!
@[this should be taken though as this is in the content]
[tab2]
help me
write a better RE
"""
m = regex.findall(haystack)
print m
what im trying to get is:
[(‘tab1’, ‘this is captured\nbut this is suppose to be captured too!\n@[this should be taken though as this is in the content]\n’, ‘[tab2]’,’help me\nwrite a better RE\n’)]
edit:
regex = re.compile(r"""
^ # Must start in a newline first
\[(.*?)\] # Get what's enclosed in brackets
\n # only capture bracket if a newline is next
([^\[]*) # stop reading at opening bracket
""", re.MULTILINE | re.VERBOSE)
this seems to work but it’s also trimming the brackets inside the content.
Python regex doesn’t support recursion afaik.
EDIT: but in your case this would work:
EDIT 2: yes, it doesn’t work properly.
I do agree with viraptor though. Regex are cool but you can’t check your file for errors with them. A hybrid perhaps? 😛
EDIT 3: That’s because
^character means negative match only inside[^squarebrackets]. Everywhere else it means string start (or line start withre.MULTILINE). There’s no good way for negative string matching in regex, only character.