I have to match all the alphanumeric words from a text. >>> import re

Question

0

Asked: May 27, 20262026-05-27T03:17:55+00:00 2026-05-27T03:17:55+00:00

I have to match all the alphanumeric words from a text. >>> import re

0

I have to match all the alphanumeric words from a text.

>>> import re
>>> text = "hello world!! how are you?"
>>> final_list = re.findall(r"[a-zA-Z0-9]+", text)
>>> final_list
['hello', 'world', 'how', 'are', 'you']
>>>

This is fine, but further I have few words to negate i.e. the words that shouldn’t be in my final list.

>>> negate_words = ['world', 'other', 'words']

A bad way to do it

>>> negate_str = '|'.join(negate_words)
>>> filter(lambda x: not re.match(negate_str, x), final_list)
['hello', 'how', 'are', 'you']

But i can save a loop if my very first regex-pattern can be changed to consider negation of those words. I found negation of characters but i have words to negate, also i found regex-lookbehind in other questions, but that doesn’t help either.

Can it be done using python re?

Update

My text can span a few hundered lines. Also, list of negate_words can be lengthy too.

Considering this, is using regex for such task, correct in the first place?? Any suggestions??

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T03:17:56+00:00

I don’t think there is a clean way to do this using regular expressions. The closest I could find was bit ugly and not exactly what you wanted:

>>> re.findall(r"\b(?:world|other|words)|([a-zA-Z0-9]+)\b", text)
['hello', '', 'how', 'are', 'you']

Why not use Python’s sets instead. They are very fast:

>>> list(set(final_list) - set(negate_words))
['hello', 'how', 'are', 'you']

If order is important, see the reply from @glglgl below. His list comprehension version is very readable. Here’s a fast but less readable equivalent using itertools:

>>> negate_words_set = set(negate_words)
>>> list(itertools.ifilterfalse(negate_words_set.__contains__, final_list))
['hello', 'how', 'are', 'you']

Another alternative is the build-up the word list in a single pass using re.finditer:

>>> result = []
>>> negate_words_set = set(negate_words)
>>> result = []
>>> for mo in re.finditer(r"[a-zA-Z0-9]+", text):
    word = mo.group()
    if word not in negate_words_set:
         result.append(word)

>>> result
['hello', 'how', 'are', 'you']

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have to match all the alphanumeric words from a text. >>> import re

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply