I am trying to create a regular expression in Python that matches #hashtags. My

Question

0

Editorial Team

Asked: June 11, 20262026-06-11T21:48:40+00:00 2026-06-11T21:48:40+00:00

I am trying to create a regular expression in Python that matches #hashtags. My

0

I am trying to create a regular expression in Python that matches #hashtags. My definition on a hashtag is:

It is a work that starts with a #
It can contain all characters except [ ,\.]
It can be anywhere in the text

So in this text

#This string cont#ains #four, and #only four #hashtags.

The hashes here are This, four, only and hashtags.

The problem I have is the optional check for the beginning of line.

[ \.,]+ won’t do it since it won’t match the optional beginning.
[ \.,]? won’t do it since it matches too much.

Example with +

In []: re.findall('[ \.,]+#([^ \.,]+)', '#This string cont#ains #four, and #only four #hashtags.')
Out[]: ['four', 'only', 'hashtags']

Example with ?

In []: re.findall('[ \.,]?#([^ \.,]+)', '#This string cont#ains #four, and #only four #hashtags.')
Out[]: ['This', 'ains', 'four', 'only', 'hashtags']

How can optional match the beginning of the line?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T21:48:41+00:00

This seems to work:

>>> re.findall(r'\B#([^,\W]+)', '#This string cont#ains #four, and #only four #hashtags.')
['This', 'four', 'only', 'hashtags']

\B: Matches the empty string, but only when it is not at the beginning or end of a word. This means that r'py\B' matches 'python', 'py3', 'py2', but not 'py', 'py.', or 'py!'. \B is just the opposite of \b, so is also subject to the settings of LOCALE and UNICODE.
\W: When the LOCALE and UNICODE flags are not specified, matches any non-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9_]. With LOCALE, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than [0-9_] plus characters classied as not alphanumeric in the Unicode character properties database.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to create a regular expression in Python that matches #hashtags. My

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply