I have a string and have to extract. The problem is I can’t describe repetition within a repetition. So here is the code:
f = "Makimak-cg_mk_Mokarmi"
pattern = "([A-Za-z][A-Za-z0-9]+)((?:[-_]([a-z]{2}))+)"
mO = re.match(pattern, f)
print mO.groups()
And the result will be:
('Makimak', '-cg_mk', 'mk')
But I would like to get tuple like this:
('Makimak', '-cg_mk', 'cg', 'mk')
So there is a group “-cg_mk” which include a repetition of the two character pattern. But there is no thing like that:
[a-z]{2}+
The groups of the result give back only the last part of the repetition expressed here:
([a-z]{2})
My thought was that there should be a “+” too like this:
([a-z]{2})+
It gives the same result. The match object is generated, simply I can’t get the groups that I want.
You may need to do this in two steps:
This just captures the groups
('Makimak', '-cg_mk'), and then combines this with the result of splitting the second group on occurrences of-or_.If you always knew the exact number of two character patterns you could accomplish this with a lookahead, but it doesn’t seem like that is known up front or you wouldn’t need the repetition.