I’m trying to match a specific pattern using the re module in python. I

Question

0

Asked: June 7, 20262026-06-07T06:37:14+00:00 2026-06-07T06:37:14+00:00

I’m trying to match a specific pattern using the re module in python. I

0

I’m trying to match a specific pattern using the re module in python.
I wish to match a full sentence (More correctly I would say that they are alphanumeric string sequences separated by spaces and/or punctuation)

Eg.

“This is a regular sentence.”
“this is also valid”
“so is This ONE”

I’m tried out of various combinations of regular expressions but I am unable to grasp the working of the patterns properly, with each expression giving me a different yet inexplicable result (I do admit I am a beginner, but still).

I’m tried:

“((\w+)(\s?))*”

To the best of my knowledge this should match one or more alpha alphanumerics greedily followed by either one or no white-space character and then it should match this entire pattern greedily. This is not what it seems to do, so clearly I am wrong but I would like to know why. (I expected this to return the entire sentence as the result)
The result I get for the first sample string mentioned above is [(‘sentence’, ‘sentence’, ”), (”, ”, ”), (”, ”, ”), (”, ”, ”)].
“(\w+ ?)*”

I’m not even sure how this one should work. The official documentation(python help(‘re’)) says that the ,+,? Match x or x (greedy) repetitions of the preceding RE.
In such a case is simply space the preceding RE for ‘?’ or is ‘\w+ ‘ the preceding RE? And what will be the RE for the ‘‘ operator? The output I get with this is [‘sentence’].
Others such as “(\w+\s?)+)” ; “((\w*)(\s??)) etc. which are basically variation of the same idea that the sentence is a set of alpha numerics followed by a single/finite number of white spaces and this pattern is repeated over and over.

Can someone tell me where I go wrong and why, and why the above expressions do not work the way I was expecting them to?

P.S I eventually got “[ \w]+” to work for me but With this I cannot limit the number of white-space characters in continuation.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T06:37:16+00:00

Your reasoning about the regex is correct, your problem is coming from using capturing groups with *. Here’s an alternative:

>>> s="This is a regular sentence."
>>> import re
>>> re.findall(r'\w+\s?', s)
['This ', 'is ', 'a ', 'regular ', 'sentence']

In this case it might make more sense for you to use \b in order to match word boundries.

>>> re.findall(r'\w+\b', s)
['This', 'is', 'a', 'regular', 'sentence']

Alternatively you can match the entire sentence via re.match and use re.group(0) to get the whole match:

>>> r = r"((\w+)(\s?))*"
>>> s = "This is a regular sentence."
>>> import re
>>> m = re.match(r, s)
>>> m.group(0)
'This is a regular sentence'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to match a specific pattern using the re module in python. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply