I’m working on a little Python script that is supposed to match a series

Question

0

Asked: June 5, 20262026-06-05T14:17:05+00:00 2026-06-05T14:17:05+00:00

I’m working on a little Python script that is supposed to match a series

0

I’m working on a little Python script that is supposed to match a series of authors and I’m using the re-module for that. I came across something unexpected and I have been able to reduce it to the following very simple example:

>>> import re
>>> s = "$word1$, $word2$, $word3$, $word4$"
>>> word = r'\$(word\d)\$'
>>> m = re.match(word+'(?:, ' + word + r')*', s)
>>> m.groups()
('word1', 'word4')

So I’m defining a ‘basic’ regexp that matches the main parts of my input, with some recognizable features (in this case I used the $-signs) and than I try to match one word plus a possible additional list of words.

I’d have expected that m.groups() would’ve displayed:

>>> m.groups()
('word1', 'word2', 'word3', 'word4')

But apparently I’m doing something wrong. I’d like to know why this solution does not work and how to change it, such that I get the result I’m looking for. BTW, this is with Python 2.6.6 on a Linux machine, in case that matters.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T14:17:07+00:00

Although you’re re is matching every $word#$ , the second capture group is continuously getting replaced by the last item matched.

Let’s take a look at the debugger:

>>> expr = r"\$(word\d)\$(?:, \$(word\d)\$)*"
>>> c = re.compile(expr, re.DEBUG)
literal 36
subpattern 1
  literal 119
  literal 111
  literal 114
  literal 100
  in
    category category_digit
literal 36
max_repeat 0 65535
  subpattern None
    literal 44
    literal 32
    literal 36
    subpattern 2
      literal 119
      literal 111
      literal 114
      literal 100
      in
        category category_digit
    literal 36

As you can see, there are only 2 capture groups: subpattern 1 and subpattern 2. Every time another $word#$ is found, subpattern 2 gets overwritten.

As for a potential solution, I would recommend using re.findall() instead of re.match():

>>> s = "$word1$, $word2$, $word3$, $word4$"
>>> authors = re.findall(r"\$(\w+)\$", s)
>>> authors
['word1', 'word2', 'word3', 'word4']

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a little Python script that is supposed to match a series

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply