I have a regex ‘simple’ that I’d like to use as a building block

Question

0

Asked: May 25, 20262026-05-25T18:10:25+00:00 2026-05-25T18:10:25+00:00

I have a regex ‘simple’ that I’d like to use as a building block

0

I have a regex ‘simple’ that I’d like to use as a building block for another regex ‘complex’. The trouble is, the capture groups in ‘simple’ are interfering with ‘complex’. These low level capture groups are details I don’t to care about. I’d love to remove them before the regex is consumed.

The question is: how?

Put another way, in code, this isn’t working well:

simple = /(a)bc/
complex = /(#{simple}) - (#{simple})/
complex.match("abc - abc").captures # => ["abc", "a", "abc", "a"]
// when I need ["abc","abc"]

I’d much rather write:

simple = /(a)bc/
complex = /(#{simple.without_capture}) - (#{simple.without_capture})/
complex.match("abc - abc").captures # => ["abc", "abc"]

I’m a stuck on how to do this, but I’m betting it’s been done before. The implementation of Regex#without_capture would need to of course account for non-capturing groups, look ahead/behind, etc. So simply removing all the () isn’t enough. Also, finding the matching ) for capture groups seems a little challenging.

Thoughts?

EDIT: I forgot to mention. I don’t want to manually create two versions of simple (a capturing and non-capturing). In my actual case it would be impractical to maintain both versions. It’d be much better to be able to toggle the capturing dynamically.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T18:10:27+00:00

This is harder than I thought. Rather than spin more wheels if I change one requirement everything seems easy. Instead of trying to replace any capture group, replace only named capture groups.

Thanks @JustinMorgan and @TimPietzcker for getting me this far.

This is what I’ve come up with:

class Regexp
  # replaces all named capture groups with non-capturing groups
  # in other words, it replaces all (?<*>...) with (?:...)
  def without_named_captures
      named_captures = %r{\(\?<[^>]+>}
      pattern = self.source.gsub(named_captures, "(?:")
      Regexp.new(pattern)
  end
end

Which passes this spec:

describe "Regexp Extensions" do
  describe "#without_named_captures" do
    it "should replace named captures with non-captures" do
      p1 = /(?<a>.*) - (?<b>.*)/
      p2 = p1.without_named_captures

      p2.should == /(?:.*) - (?:.*)/

      # sanity check
      p1.match('abc - def').should have_exactly(3).items
      p2.match('abc - def').should have_exactly(1).items
    end
  end
end

Dealing with recursion, escaping, and all the other junk, just goes away when the token is more complex than a single ‘(‘. If I use named captures everywhere, I can use this method. If I don’t, well things behave normally.

It’s late, so I don’t know if I’m missing anything, but I think this’ll work.

Thanks for the help everyone.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a regex ‘simple’ that I’d like to use as a building block

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply