I need to extract all letters after the + sign or at the beginning of a string like this:
formula = "X+BC+DAF"
I tried so, and I do not want to see the + sign in the result. I wish see only ['X', 'B', 'D'].
>>> re.findall("^[A-Z]|[+][A-Z]", formula)
['X', '+B', '+D']
When I grouped with parenthesis, I got this strange result:
re.findall("^([A-Z])|[+]([A-Z])", formula)
[('X', ''), ('', 'B'), ('', 'D')]
Why it created tuples when I try to group ? How to write the regexp directly such that it returns ['X', 'B', 'D'] ?
If there are any capturing groups in the regular expression then
re.findallreturns only the values captured by the groups. If there are no groups the entire matched string is returned.Instead of using a capturing group you can use a non-capturing group:
Or for this specific case you could try a simpler solution using a word boundary:
Or a solution using
str.splitthat doesn’t use regular expressions: