I have this regex:
a_list = re.compile(r'\(\d+\)\s*\n').split(content)
Its working great to match lines with (number) in the end, however I need to get that number as well.
How do I do that?
Thanks.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
As described on the regular expressions documentation for Python, the split method on a regex splits the string that’s being worked on by all matches of the regex pattern. Right now, your regex is capturing all matches that have a number at the end and splitting the string on that match. So a_list contains everything in every line containing a number except for the number and its surrounding parentheses (and optional space).
Assuming that you don’t want to split all newlines (by just making your regex
'\n'), you can use a negative lookbehind to only capture matches to the regex where another regex is preceding the match, but not include the matches for that second regex in the results. The format for that is(?<!x)ywhere all instances of y will be captured if and only if the x phrase is proceeding it, but the x phrase will not be included along with it.The only problem with using a negative lookbehind in this situation is that it requires a fixed number of characters to be matched, but you have
\d+, which could be any number of characters. Fortunately, you can just drop the+as well as the leading\(so that you check to see if there is at least one digit preceding your regex pattern so that we’re just checking for\d\); which works because we don’t care if the line ends with(10000)or(1).Unfortunately, this would cause lines like
(abc123)to be captured, which doesn’t match the\(\d+\)regex you originally had. If you need to ensure that lines end with parentheses that only contain a multi-digit a number, you’ll probably have to use multiple regex operations.That does leave the problem of
\s*, so you have two options. You can either create an or expression if you know how many spaces will be at the end, e.g.(\d|\d\s), or you can just include the\s*in the match with the newline character, thus removing any trailing whitespace as well.Assuming you take the latter option, your example would look like
(?<!\d\))\s*\n, which will result in a_list containing all lines that include a number at the end being included, along with the number itself (and its surrounding parentheses).