I’m stuck on a RegEx problem that’s seemingly very simple and yet I can’t get it working.
Suppose I have input like this:
Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit% Some text %interestingbit% lots of random text OPTIONAL_THING lots and lots more %anotherinterestingbit% Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit%
There are many repeating blocks in the input and in each block I want to capture some things that are always there (%interestingbit% and %anotherinterestingbit%), but there is also a bit of text that may or may not occur in-between them (OPTIONAL_THING) and I want to capture it if it’s there.
A RegEx like this matches only blocks with OPTIONAL_THING in it (and the named capture works):
%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING)).+?%anotherinterestingbit%
So it seems like it’s just a matter of making the whole group optional, right? That’s what I tried:
%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING))?.+?%anotherinterestingbit%
But I find that although this matches all 3 blocks the named capture (OptionalCapture) is empty in all of them! How do I get this to work?
Note that there can be a lot of text within each block, including newlines, which is why I put in ‘.+?’ rather than something more specific. I’m using .NET regular expressions, testing with The Regulator.
My thoughts are along similar lines to Niko’s idea. However, I would suggest placing the 2nd .+? inside the optional group instead of the first, as follows:
This avoids unnecessary backtracking. If the first .+? is inside the optional group and OPTIONAL_THING does not exist in the search string, the regex won’t know this until it gets to the end of the string. It will then need to backtrack, perhaps quite a bit, to match %anotherinterestingbit%, which as you said will always exist.
Also, since OPTIONAL_THING, when it exists, will always be before %anotherinterestingbit%, then the text after it is effectively optional as well and fits more naturally into the optional group.