Sounds more like you'd want to create a virtual printer…

Question

0

Asked: May 12, 20262026-05-12T20:09:56+00:00 2026-05-12T20:09:56+00:00

I am trying to parse a list of items which satisfies the python regex

0

I am trying to parse a list of items which satisfies the python regex

r'\A(("[\w\s]+"|\w+)\s+)*\Z'

that is, it’s a space separated list except that spaces are allowed inside quoted strings. I would like to get a list of items in the list (that is of items matched by the

r'("[\w\s]+"|\w+)'

part. So, for example

>>> parse('foo "bar baz" "bob" ')
['foo', '"bar baz"', '"bob"']

Is there any nice way to do this with python re?

Many things don’t quite work. For example

>>> re.match(r'\A(("[\w\s]+"|\w+)\s+)*\Z', 'foo "bar baz" "bob" ').group(2)
'"bob"'

only returns the last one it matched. On the other hand

>>> re.findall(r'("[\w\s]+"|\w+)', 'foo "bar baz" "bob" ')
['foo', '"bar baz"', '"bob"']

but it also accepts malformed expressions like

>>> re.findall(r'("[\w\s]+"|\w+)', 'foo "bar b-&&az" "bob" ')
['foo', 'bar', 'b', 'az', '" "', 'bob']

So is there any way to use the original regex and get all of the items that matched group 2? Something like

>>> re.match_multigroup(r'\A(("[\w\s]+"|\w+)\s+)*\Z', 'foo "bar baz" "bob" ').group(2)
['foo', '"bar baz"', '"bob"']
>>> re.match_multigroup(r'("[\w\s]+"|\w+)', 'foo "bar b-&&az" "bob" ')
None

Edit: It is important that I preserve the quotes in the output, thus I don’t want

>>> re.match_multigroup(r'\A(("[\w\s]+"|\w+)\s+)*\Z', 'foo "bar baz" "bob" ').group(2)
['foo', 'bar baz', 'bob']

because then I don’t know if bob was quoted or not.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T20:09:57+00:00

Alright, I ended up deciding to do this in two steps.

First I check that the expression is syntactically valid and second I break it into individual pieces:

def parse(expr):
    if re.match(r'\A(("[\w\s]+"|\w+)\s+)*\Z', expr):
        return re.findall(r'("[\w\s]+"|\w+)', expr)

So:

>>> parse('foo "bar baz" "bob" ')
['foo', '"bar baz"', '"bob"']
>>> parse('foo "bar b-&&az" "bob" ')
>>> parse('foo "bar" ')
['foo', '"bar"']
>>> parse('"foo" bar ')
['"foo"', 'bar']
>>> parse('foo"bar baz" "bob" ')
>>> parse('&&')

I’m about 90% sure that this method works appropriately for all strings, but I would still be interested if anyone had a more general solution, this seems sort of kludgey to me.

Thanks SilentGhost and Alan Moore for the help. I did not know about python csv or regex lookaheads before, it might be helpful to me to learn about those.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse a list of items which satisfies the python regex

Leave an answerCancel reply

1 Answer

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Leave an answer
Cancel reply