In .NET, regex is not organizing captures as I would expect. (I won’t call this a bug, because obviously someone intended it. However, it’s not how I’d expect it to work nor do I find it helpful.)
This regex is for recipe ingredients (simplified for sake of example):
(?<measurement> # begin group
\s* # optional beginning space or group separator
(
(?<integer>\d+)| # integer
(
(?<numtor>\d+) # numerator
/
(?<dentor>[1-9]\d*) # denominator. 0 not allowed
)
)
\s(?<unit>[a-zA-Z]+)
)+ # end group. can have multiple
My string: 3 tbsp 1/2 tsp
Resulting groups and captures:
[measurement][0]=3 tbsp
[measurement][1]= 1/2 tsp
[integer][0]=3
[numtor][0]=1
[dentor][0]=2
[unit][0]=tbsp
[unit][1]=tsp
Notice how even though 1/2 tsp is in the 2nd Capture, it’s parts are in [0] since these spots were previously unused.
Is there any way to get all of the parts to have predictable useful indexes without having to re-run each group through the regex again?
Not with Captures. And if you’re going to perform multiple matches anyway, I suggest you remove the
+and match each component of the measurement separately, like so:output:
The
\Gat the beginning ensures that matches occur only at the point where the previous match ended (or at the beginning of the input if this is the first match attempt). You can also save the match-end position between calls, then use the two-argumentMatchesmethod to resume parsing at that same point (as if that were really the beginning of the input).