Problem
There is a program file that contains the following code snippet at some point in the file.
...
food($apples$ , $oranges$ , $pears$ , $tomato$){
...
}
...
This function may contain any number of parameters but they must be strings separated by commas. All the parameter strings are lowercase words.
I want to be able to parse out each of the parameters using a regular expression. For example the resulting list in python would be as follows:
["apples", "oranges", "pears", "tomato"]
Attempted Solution
Using the python RE module, I was able to achieve this by breaking the problem into two parts.
-
Find the function in the code and extract the list of parameters.
plist = re.search(r'food\((.*)\)', programString).group(1) -
Split the list using another regular expression.
params = re.findall(r'[a-z]+', plist)
Question
Is there anyway I could achieve this with one regular expression instead of two?
Edit
Thanks to Tim Pietzcker’s answer I was able to find some related questions:
To answer your question “Can it be done in a single regex?”: Yes, but not in Python.
If you want to match and capture (individually) an unknown number of matches as in your example, using only a single regular expression, then you need a regex engine that supports captures (as opposed to capturing groups). Only .NET and Perl 6 do this currently.
So in Python, you either need to do it in two steps (
findthe entirefood(...)function call, and thenfindallindividual matches with a second regex as suggested by Dingo).Or use a parser like Paul McGuire’s
pyparsing.