Lets say this is our text: text = ‘After 1992 , the winter and

Question

0

Asked: June 14, 20262026-06-14T05:44:08+00:00 2026-06-14T05:44:08+00:00

Lets say this is our text: text = ‘After 1992 , the winter and

0

Lets say this is our text:

text = 'After 1992 , the winter and summer Olympics will be held two years apart , with the revised schedule beginning with the winter games in 1994 and the summer games in 1996 . ) Now , Mr. Pilson -- a former college basketball player who says a good negotiator needs `` a level of focus and intellectual attention  similar to a good athlete-s is facing the consequences of his own aggressiveness . Next month , talks will begin on two coveted CBS contracts'
print re.search(r'(\w+ |\W+ ){0,4}1992( \W+| \w+){4}', text).group(0)

Output: After 1992 , the winter and

But this one gives me:

print re.search(r'(\w+ |\W+ ){0,4}1992( \W+| \w+){0,4}', text).group(0)

Output: After 1992 ,

It seems strange for me because why the second regex is not greedy?

This one is a bit strange than others:

print re.search(r'(\w+ |\W+ ){0,4}summer( \W+| \w+){0,4}', text).group(0)

Output , the winter and summer Olympics will be held

Questions

1- What is the difference between the first and the second one. For me, it should give the same text because the only difference is {0,4} and if {4} gives long string, {0,4} should give the same string because regex is greedy.

2- The problem may be related punctuation because third example works same both {0,4} and {4}..

I am confused.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T05:44:10+00:00

No mystery here.

In your second example, ␣\W+ overmatched ␣,␣ (blank ␣ is also part of the \W class), so no subsequent matches were found for ␣\w+ against the remaining the␣winter␣... — but the {0,4} constraint was satisfied, so no need for those further matches. So far so good.

Coming back to your first example, the match above did not satisfy {4}, so the engine kept looking. In the ␣\W+ match it backtracked the last blank ␣ so ␣\W+ only matched ␣,, then 3 subsequent matches for ␣\w+ could be made against ␣the␣winter␣... — and {4} was satisfied.

Change your regular expression to either ([^ ]+ +){0,4}my_word( +[^ ]+){0,4} (this maintains the spirit of your original expression, treat spaces as separators and everything else, including punctuation, as words) or, maybe better, (\w+\W+){0,4}my_word(\W+\w+){0,4} to isolate up to 4 actual words on either side irrespective of punctuation.

Later,

Hi vladr. Regular expression that you provided is not working with
this text (target word is part in this text):

The city ‘s Department of Consumer Affairs charged Newmark & Lewis Inc. with failing to deliver on its promise of lowering prices . In a civil suit commenced in state Supreme Court in New York , the agency alleged that the consumer-electronics and appliance discount-retailing chain engaged in deceptive advertising by claiming to have ” lowered every price on every item ” as part of an advertising campaign that began June 1 . The agency said it monitored Newmark & Lewis ‘s advertised prices before and after the ad campaign , and found that the prices of at least 50 different items either increased or stayed the same . In late May , Newmark & Lewis announced a plan to cut prices 5 % to 20 % and eliminate what it called a ” standard discount-retailing practice ” of negotiating individual deals with customers .”

Aha. It matched part in Department.

If you only want to match whole words then use (^|(\w+\W+){1,5})\W*my_word\W*((\W+\w+){1,5}|$), this should isolate the word between separators and/or line ends.
If you want to match part in Department then use (\w+\W+){0,5}\w*my_word\w*(\W*\w+){0,5}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Lets say this is our text: text = ‘After 1992 , the winter and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply