I’m working on a ruby baser lexer. To improve performance, I joined up all

Question

0

Asked: June 14, 20262026-06-14T14:10:45+00:00 2026-06-14T14:10:45+00:00

I’m working on a ruby baser lexer. To improve performance, I joined up all

0

I’m working on a ruby baser lexer. To improve performance, I joined up all tokens’ regexps into one big regexp with match group names. The resulting regexp looks like:

/\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/

I’m matching it to my string producing a MatchData where exactly one token is parsed:

bigregex =~ "\n ... garbage"
puts $~.inspect

Which outputs

#<MatchData
 "\n"
 __anonymous_-1038694222803470993:"\n"
 __anonymous_-1394418499721420065:nil
 __anonymous_3077187815313752157:nil
 LET:nil
 IN:nil
 CLASS:nil
 DEF:nil
 DEFM:nil
 MULTICLASS:nil
 FUNCNAME:nil
 ID:nil
 STRING:nil
 NUMBER:nil>

So, the regex actually matched the “\n” part. Now, I need to figure the match group where it belongs (it’s clearly visible from #inspect output that it’s _anonymous-1038694222803470993, but I need to get it programmatically).

I could not find any option other than iterating over #names:

m.names.each do |n|
  if m[n]
    type = n.to_sym
    resolved_type = (n.start_with?('__anonymous_') ? nil : type)
    val = m[n]
    break
  end
end

which verifies that the match group did have a match.

The problem here is that it’s slow (I spend about 10% of time in the loop; also 8% grabbing the @input[@pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the @pos in it).

You can check the full code at GH repo.

Any ideas on how to make it at least a bit faster? Is there any option to figure the “successful” match group easier?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T14:10:46+00:00

You can do this using the regexp methods .captures() and .names():

matching_string = "\n ...garbage"   # or whatever this really is in your code
@input = matching_string.match bigregex   # bigregex = your regex
arr = @input.captures

arr.each_with_index do |value, index|     
  if not value.nil?
    the_name_you_want = @input.names[index]
  end
end

Or if you expect multiple successful values, you could do:

success_names_arr = []
success_names_arr.push(@input.names[index]) #within the above loop

Pretty similar to your original idea, but if you’re looking for efficiency .captures() method should help with that.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a ruby baser lexer. To improve performance, I joined up all

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply