I am trying to parse a pattern with regular expressions in Ruby. The pattern is something like,
<number>? <comma>? <number>? <term>*
where:
numberis one or more digitscommais","termis of the form[.*]or[^.*]
And I am trying to capture the numbers, and all the terms. To clarify, here are some examples of valid patterns:
5,50[foo,bar]
5,[foo][^apples]
10,100[baseball][^basketball][^golf]
,55[coke][pepsi][^drpepper][somethingElse]
In the first, I’d like to capture 5, 50, and [foo,bar]
In the second, I’d like to capture 5, [foo] and [^apples] and so on.
The pattern I came up with is:
/(\d+)?,?(\d+)?(\[\^?[^\]]+\])+/
but this only matches the numbers and the last term. If I remove the + at the end, then it only matches the first term.
Easiest solution that I can think of with minimal effort would probably be to just throw on an additional capture group by surrounding the group and the
+that are already there, i.e.Also, you could probably simplify the
\dexpressions by just doing(\d*)instead of(\d+)?…EDIT
Here’s the code used to test the above suggestions:
and the output:
Edit 2
If you’re wanting tokenization, as per J-_-L’s suggestion with the
scanmethod, add in: