How can I build a regular expression in python which can match all the following?
where it is a “string (a-zA-Z)” follow by a space follow by 1 or multiple 4 integers which separates by a comma:
Example:
someotherstring 42 1 48 17,
somestring 363 1 46 17,363 1 34 17,401 3 8 14,
otherstring 42 1 48 17,363 1 34 17,
I have tried the following, since I need to know each integers:
myRE=re.compile("(\s+) ((\d+) (\d+) (\d+) (\d+),)+"
But how can I find out how many 4 integers I have? and how can I process each of them?
Thank you.
Here is a pyparsing processor for your input string:
Using patt.parseString returns a pyparsing ParseResults object, which has some nice list/dict/object properties. First, just printing out the results as a list:
See how each of your groups is grouped as a sublist?
Now let’s have the parser do a bit more work for us. At parse time, we already know we are parsing valid integers – anything matching
Word(nums)has to be an integer. So we can add a parse action to do this conversion at parse time:Now, we recreate our pattern, and parsing now gives us groups of numbers:
Lastly, we can also assign names to the bits parsed out of this input:
The list of returned items is the same:
But if we dump() the results, we see what we can access by name:
We can use those names for dict-like or attribute-like access. I’m partial to the attribute style myself:
Here is the entire parser and output processor: