I’m using Python to parse a file in search for e-mail addresses, but I can’t figure out what the syntax for alternative regexps should be. Here’s the code:
addresses = []
pattern = '(\w+)@(\w+\.com)|(\w+)@(it.\w+\.com)'
for line in file:
matches = re.findall(pattern,line)
for m in matches:
address = '%s@%s' % m
addresses.append(address)
So I want to find addresses that look like john@company.com or john@it.company.com, but the above code doesn’t work because either the first two groups are empty or the last two groups are empty. What is the correct solution? I need to use groups to store the user name (before @) and server name (after @) separately.
EDIT: Matching email adresses is only an example. What I’m trying to find out is how to match different regexps that have only one thing in common – they match two groups.
(\w+)@((?:it\.)?\w+\.com)You want to capture the part after the
@whether it’sexample.comorit.example.com, so you put both options inside the same capture group. But since they share a similar format, you can condense(it\.\w+\.com|\w+\.com)to just((it\.)?\w+\.com)The
(?: )makes that parens a non-capturing group, so it won’t take part in your matched groups. There will be one match for the first(\w+), and one match for the whole((?:it\.)?\w+\.com)after the@. That’s two matches total, plus the default group-0 match for the full string.EDIT: To answer your new question, all you have to do is follow the grouping I used, but stop before you condense it.
If your test cases are:
1)
example@abcdef2)
example@123456You could write your regex as such:
(\w+)@([a-zA-Z]+|\d+), which would always have the part before the@in group 1, and the part after in group 2. Notice that there are only two pairs of parens, and the|(“or”) operator appears inside of the second parens group.