I’m trying to use Python 2.7 regex’s to retrieve data from sample web pages

Question

0

Asked: May 31, 20262026-05-31T20:00:41+00:00 2026-05-31T20:00:41+00:00

I’m trying to use Python 2.7 regex’s to retrieve data from sample web pages

0

I’m trying to use Python 2.7 regex’s to retrieve data from sample web pages that have been provided in a course I’m taking. The code I’m trying to get to work is:

email_patterns = ['(?P<lname>[\w+\.]*\w+ *)@(?P<domain> *\w+[\.\w+]*).(?P<tld>com)

for pattern in email_patterns:
        # 'line' is a line of text in a sample web page
        matches = re.findall(pattern,line)
        for m in matches:
            print 'matches=', m
            email = '{}@{}.{}'.format(m.group('lname'), m.group('domain'),m.group('tld'))

Running this returns the following error:

email = '{}@{}.{}'.format(m.group('lname'), m.group('domain'), m.group('tld'))
AttributeError: 'tuple' object has no attribute 'group'.

I want to use named groups because the sequence of the groups can change depending on the text I’m matching. However, it doesn’t appear to work because the compiler doesn’t think that ‘m’ is a Group object.

What’s going on here, and how can I get this to work properly by using named groups?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T20:00:42+00:00

You have two problems. Like Ignacio hinted, you shouldn’t be parsing (X)HTML with regex… regular expressions are not able to handle the complexity. The other problem is that you’re using findall() instead of finditer(). findall() returns the matches as a list… in the event of groups, it returns it as a list of tuples.

finditer() on the otherhand returns an iterator of MatchGroup objects that has a group() method.

From the python documentation for re:

re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings. The string is
scanned left-to-right, and matches are returned in the order found. If
one or more groups are present in the pattern, return a list of
groups; this will be a list of tuples if the pattern has more than one
group. Empty matches are included in the result unless they touch the
beginning of another match.

re.finditer(pattern, string, flags=0) Return an iterator yielding
MatchObject instances over all non-overlapping matches for the RE
pattern in string. The string is scanned left-to-right, and matches
are returned in the order found. Empty matches are included in the
result unless they touch the beginning of another match.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to use Python 2.7 regex’s to retrieve data from sample web pages

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply