I have some text. You can see it here.
str1 = '{5723647 9 aqua\t \tfem nom/voc pl}{5723647 9 aqua\t \tfem dat sg}{5723647 9 aqua\t \tfem gen sg}'
str2 = '{27224035 2 equo_,equus#1\t \tmasc abl sg}{27224035 2 equo_,equus#1\t \tmasc dat sg}'
Here is what I want to get:
result1 = [('aqua', 'fem nom/voc pl'), ('aqua', 'fem dat sg'), ('aqua', 'fem gen sg')]
result2 = [('equus#1', 'masc abl sg'), ('equus#1', 'masc dat sg')]
As you see here can be two variants:
- (anytext,)(word-I-need)\t \t(form-I-need).
- (anytext )(word-I-need)\t \t(form-I-need).
Here is regex what I’ve tried:
pattern = re.compile(r'\d* \d*(?:,| )(.*?)\t \t(.*?)}')
Here is what I get:
[('aqua', 'fem nom/voc pl'), ('aqua', 'fem dat sg'), ('aqua', 'fem gen sg')]
[('equo_,equus#1', 'masc abl sg'), ('equo_,equus#1', 'masc dat sg')]
However, the second must be:
[('equus#1', 'masc abl sg'), ('equus#1', 'masc dat sg')]
What could you advice? Thanks!
1 Answer