I am using python’s re.findall method to find occurrence of certain string value in Input string.
e.g. From search in ‘ABCdef’ string, I have two search requirements.
- Find string starting from Single Capital letter.
- After 1 find string that contains all capital letter.
e.g. input string and expected output will be:
'USA' -- output: ['USA']'BObama' -- output: ['B', 'Obama']'Institute20CSE' -- output: ['Institute', '20', 'CSE']
So My expectation from
>>> matched_value_list = re.findall ( '[A-Z][a-z]+|[A-Z]+' , 'ABCdef' )
is to return ['AB', 'Cdef'].
But which does Not seems to be happening. What I get is ['ABC'] as return value, which matches later part of regex with full string.
So Is there any way we can ignore found matches. So that once 'Cdef' is matched with '[A-Z][a-z]+'. second part of regex (i.e. '[A-Z]+') only matches with remaining string 'AB'?
First you need to match
AB, which is followed by an Uppercase alphabet and then a lowercase alphabet. or is at the end of the string. For that you can uselook-ahead.Then you need to match an Uppercase alphabet
C, followed by multiple lowercase alphabetsdef.So, you can use this pattern:
As pointed out in comment by @sotapme, you can also modify the above regex to: –
Added
\d+since you also want to match digit as in one of your example. Also, he removed[a-z]part from the first part of look-ahead. That works because,+quantifier on the[A-Z]outside is greedy by default, so, it will automatically match maximum string, and will stop only before the lastupper casealphabet.