I have a string on which I try to create a regex mask that

Question

0

Asked: May 15, 20262026-05-15T22:37:13+00:00 2026-05-15T22:37:13+00:00

I have a string on which I try to create a regex mask that

0

I have a string on which I try to create a regex mask that will show N number of words, given an offset. Let’s say I have the following string:

"The quick, brown fox jumps over the lazy dog."

I want to show 3 words at the time:

offset 0: "The quick, brown"
offset 1: "quick, brown fox"
offset 2: "brown fox jumps"
offset 3: "fox jumps over"
offset 4: "jumps over the"
offset 5: "over the lazy"
offset 6: "the lazy dog."

I’m using Python and I’ve been using the following simple regex to detect 3 words:

>>> import re
>>> s = "The quick, brown fox jumps over the lazy dog."
>>> re.search(r'(\w+\W*){3}', s).group()
'The quick, brown '

But I can’t figure out how to have a kind of mask to show the next 3 words and not the beginning ones. I need to keep punctuation.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T22:37:14+00:00

The prefix-matching option

You can make this work by having a variable-prefix regex to skip the first offset words, and capturing the word triplet into a group.

So something like this:

import re
s = "The quick, brown fox jumps over the lazy dog."

print re.search(r'(?:\w+\W*){0}((?:\w+\W*){3})', s).group(1)
# The quick, brown 
print re.search(r'(?:\w+\W*){1}((?:\w+\W*){3})', s).group(1)
# quick, brown fox      
print re.search(r'(?:\w+\W*){2}((?:\w+\W*){3})', s).group(1)
# brown fox jumps

Let’s take a look at the pattern:

 _"word"_      _"word"_
/        \    /        \
(?:\w+\W*){2}((?:\w+\W*){3})
             \_____________/
                group 1

This does what it says: match 2 words, then capturing into group 1, match 3 words.

The (?:...) constructs are used for grouping for the repetition, but they’re non-capturing.

References

regular-expressions.info/Capturing Groups, Non-capturing Groups
- Repeating a Capturing Group vs Capturing a Repeated Group

Note on “word” pattern

It should be said that \w+\W* is a poor choice for a “word” pattern, as exhibited by the following example:

import re
s = "nothing"
print re.search(r'(\w+\W*){3}', s).group()
# nothing

There are no 3 words, but the regex was able to match anyway, because \W* allows for an empty string match.

Perhaps a better pattern is something like:

\w+(?:\W+|$)

That is, a \w+ that is followed by either a \W+ or the end of the string $.

The capturing lookahead option

As suggested by Kobi in a comment, this option is simpler in that you only have one static pattern. It uses findall to capture all matches (see on ideone.com):

import re
s = "The quick, brown fox jumps over the lazy dog."

triplets = re.findall(r"\b(?=((?:\w+(?:\W+|$)){3}))", s)

print triplets
# ['The quick, brown ', 'quick, brown fox ', 'brown fox jumps ',
#  'fox jumps over ', 'jumps over the ', 'over the lazy ', 'the lazy dog.']

print triplets[3]
# fox jumps over

How this works is that it matches on zero-width word boundary \b, using lookahead to capture 3 “words” in group 1.

    ______lookahead______
   /      ___"word"__    \
  /      /           \    \
\b(?=((?:\w+(?:\W+|$)){3}))
     \___________________/
           group 1

References

regular-expressions.info/Lookarounds

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a string on which I try to create a regex mask that

Leave an answerCancel reply

1 Answer

The prefix-matching option

References

Note on “word” pattern

The capturing lookahead option

References

Leave an answer
Cancel reply