I have a regex ‘simple’ that I’d like to use as a building block for another regex ‘complex’. The trouble is, the capture groups in ‘simple’ are interfering with ‘complex’. These low level capture groups are details I don’t to care about. I’d love to remove them before the regex is consumed.
The question is: how?
Put another way, in code, this isn’t working well:
simple = /(a)bc/
complex = /(#{simple}) - (#{simple})/
complex.match("abc - abc").captures # => ["abc", "a", "abc", "a"]
// when I need ["abc","abc"]
I’d much rather write:
simple = /(a)bc/
complex = /(#{simple.without_capture}) - (#{simple.without_capture})/
complex.match("abc - abc").captures # => ["abc", "abc"]
I’m a stuck on how to do this, but I’m betting it’s been done before. The implementation of Regex#without_capture would need to of course account for non-capturing groups, look ahead/behind, etc. So simply removing all the () isn’t enough. Also, finding the matching ) for capture groups seems a little challenging.
Thoughts?
EDIT: I forgot to mention. I don’t want to manually create two versions of simple (a capturing and non-capturing). In my actual case it would be impractical to maintain both versions. It’d be much better to be able to toggle the capturing dynamically.
This is harder than I thought. Rather than spin more wheels if I change one requirement everything seems easy. Instead of trying to replace any capture group, replace only named capture groups.
Thanks @JustinMorgan and @TimPietzcker for getting me this far.
This is what I’ve come up with:
Which passes this spec:
Dealing with recursion, escaping, and all the other junk, just goes away when the token is more complex than a single ‘(‘. If I use named captures everywhere, I can use this method. If I don’t, well things behave normally.
It’s late, so I don’t know if I’m missing anything, but I think this’ll work.
Thanks for the help everyone.