I have this piece of code that finds words that begin with @ or

Question

0

Asked: May 15, 20262026-05-15T04:40:44+00:00 2026-05-15T04:40:44+00:00

I have this piece of code that finds words that begin with @ or

0

I have this piece of code that finds words that begin with @ or #,

p = re.findall(r'@\w+|#\w+', str)

Now what irks me about this is repeating \w+. I am sure there is a way to do something like

p = re.findall(r'(@|#)\w+', str)

That will produce the same result but it doesn’t, it instead returns only # and @. How can that regex be changed so that I am not repeating the \w+? This code comes close,

p = re.findall(r'((@|#)\w+)', str)

But it returns [('@many', '@'), ('@this', '@'), ('#tweet', '#')] (notice the extra ‘@’, ‘@’, and ‘#’.

Also, if I’m repeating this re.findall code 500,000 times, can this be compiled and to a pattern and then be faster?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T04:40:44+00:00

The solution

You have two options:

Use non-capturing group: (?:@|#)\w+
Or even better, a character class: [@#]\w+

References

regular-expressions.info/Character Class and Groups

Understanding `findall`

The problem you were having is due to how findall return matches depending on how many capturing groups are present.

Let’s take a closer look at this pattern (annotated to show the groups):

((@|#)\w+)
|\___/   |
|group 2 |     # Read about groups to understand
\________/     # how they're defined and numbered/named
 group 1

Capturing groups allow us to save the matches in the subpatterns within the overall patterns.

p = re.compile(r'((@|#)\w+)')
m = p.match('@tweet')
print m.group(1)
# @tweet
print m.group(2)
# @

Now let’s take a look at the Python documentation for the re module:

findall: Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

This explains why you’re getting the following:

str = 'lala @tweet boo #this &that @foo#bar'

print(re.findall(r'((@|#)\w+)', str))
# [('@tweet', '@'), ('#this', '#'), ('@foo', '@'), ('#bar', '#')]

As specified, since the pattern has more than one group, findall returns a list of tuples, one for each match. Each tuple gives you what were captured by the groups for the given match.

The documentation also explains why you’re getting the following:

print(re.findall(r'(@|#)\w+', str))
# ['@', '#', '@', '#']

Now the pattern only has one group, and findall returns a list of matches for that group.

In contrast, the patterns given above as solutions doesn’t have any capturing groups, which is why they work according to your expectation:

print(re.findall(r'(?:@|#)\w+', str))
# ['@tweet', '#this', '@foo', '#bar']

print(re.findall(r'[@#]\w+', str))
# ['@tweet', '#this', '@foo', '#bar']

References

Attachments

Snippet with output on ideone.com

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have this piece of code that finds words that begin with @ or

Leave an answerCancel reply

1 Answer

The solution

References

Understanding findall

References

Attachments

Leave an answer
Cancel reply

Understanding `findall`