Why do these 2 RegEx benchmarks differ so much?
They use the same RegEx, one in-place and one stored via qr//
Results:
Rate rege1.FIND_AT_END rege2.FIND_AT_END
rege1.FIND_AT_END 661157/s -- -85%
rege2.FIND_AT_END 4384042/s 563% --
Rate rege1.NOFIND rege2.NOFIND
rege1.NOFIND 678702/s -- -87%
rege2.NOFIND 5117707/s 654% --
Rate rege1.FIND_AT_START rege2.FIND_AT_START
rege1.FIND_AT_START 657765/s -- -85%
rege2.FIND_AT_START 4268032/s 549% --
# Benchmark
use Benchmark qw(:all);
my $count = 10000000;
my $re = qr/abc/o;
my %tests = (
"NOFIND " => "cvxcvidgds.sdfpkisd[s"
,"FIND_AT_END " => "cvxcvidgds.sdfpabcd[s"
,"FIND_AT_START " => "abccvidgds.sdfpkisd[s"
);
foreach my $type (keys %tests) {
my $str = $tests{$type};
cmpthese($count, {
"rege1.$type" => sub { my $idx = ($str =~ $re); },
"rege2.$type" => sub { my $idx = ($str =~ /abc/o); }
});
}
You are dealing with operations that are intrinsically very fast, so you need to run a few more tests to narrow down on where the speed is going. I have also switched the benchmark model from an external (letting
cmpthesedo it) to internal (forloop) speed magnification. This minimizes the overhead of the subroutine call and any work thatcmpthesehas to do. Finally, testing to see if the difference scales with magnitude is important (in this case it doesn’t).10: Rate $re fail /$re/ fail /$re/o fail /abc/o fail $re fail 106390/s -- -8% -72% -74% /$re/ fail 115814/s 9% -- -70% -71% /$re/o fail 384635/s 262% 232% -- -5% /abc/o fail 403944/s 280% 249% 5% -- Rate $re end /$re/ end /$re/o end /abc/o end $re end 105527/s -- -5% -71% -72% /$re/ end 110902/s 5% -- -69% -71% /$re/o end 362544/s 244% 227% -- -5% /abc/o end 382242/s 262% 245% 5% -- Rate $re start /$re/ start /$re/o start /abc/o start $re start 111002/s -- -3% -72% -73% /$re/ start 114094/s 3% -- -71% -73% /$re/o start 390693/s 252% 242% -- -6% /abc/o start 417123/s 276% 266% 7% -- 100: Rate /$re/ fail $re fail /$re/o fail /abc/o fail /$re/ fail 12329/s -- -4% -77% -79% $re fail 12789/s 4% -- -76% -78% /$re/o fail 53194/s 331% 316% -- -9% /abc/o fail 58377/s 373% 356% 10% -- Rate $re end /$re/ end /$re/o end /abc/o end $re end 12440/s -- -1% -75% -77% /$re/ end 12623/s 1% -- -75% -77% /$re/o end 50127/s 303% 297% -- -7% /abc/o end 53941/s 334% 327% 8% -- Rate $re start /$re/ start /$re/o start /abc/o start $re start 12810/s -- -3% -76% -78% /$re/ start 13190/s 3% -- -75% -77% /$re/o start 52512/s 310% 298% -- -8% /abc/o start 57045/s 345% 332% 9% -- 1000: Rate $re fail /$re/ fail /$re/o fail /abc/o fail $re fail 1248/s -- -8% -76% -80% /$re/ fail 1354/s 9% -- -74% -79% /$re/o fail 5284/s 323% 290% -- -16% /abc/o fail 6311/s 406% 366% 19% -- Rate $re end /$re/ end /$re/o end /abc/o end $re end 1316/s -- -1% -74% -77% /$re/ end 1330/s 1% -- -74% -77% /$re/o end 5119/s 289% 285% -- -11% /abc/o end 5757/s 338% 333% 12% -- Rate /$re/ start $re start /$re/o start /abc/o start /$re/ start 1283/s -- -1% -75% -81% $re start 1302/s 1% -- -75% -80% /$re/o start 5119/s 299% 293% -- -22% /abc/o start 6595/s 414% 406% 29% -- 10000: Rate /$re/ fail $re fail /$re/o fail /abc/o fail /$re/ fail 130/s -- -6% -76% -80% $re fail 139/s 7% -- -74% -79% /$re/o fail 543/s 317% 291% -- -17% /abc/o fail 651/s 400% 368% 20% -- Rate /$re/ end $re end /$re/o end /abc/o end /$re/ end 128/s -- -3% -76% -79% $re end 132/s 3% -- -76% -78% /$re/o end 541/s 322% 311% -- -10% /abc/o end 598/s 366% 354% 11% -- Rate /$re/ start $re start /$re/o start /abc/o start /$re/ start 132/s -- -1% -77% -80% $re start 133/s 1% -- -76% -79% /$re/o start 566/s 330% 325% -- -13% /abc/o start 650/s 394% 388% 15% -- 100000: Rate /$re/ fail $re fail /$re/o fail /abc/o fail /$re/ fail 13.2/s -- -8% -76% -78% $re fail 14.2/s 8% -- -74% -76% /$re/o fail 55.9/s 325% 292% -- -8% /abc/o fail 60.5/s 360% 324% 8% -- Rate /$re/ end $re end /$re/o end /abc/o end /$re/ end 12.8/s -- -3% -75% -79% $re end 13.2/s 3% -- -75% -78% /$re/o end 52.3/s 308% 297% -- -12% /abc/o end 59.7/s 365% 353% 14% -- Rate $re start /$re/ start /$re/o start /abc/o start $re start 13.4/s -- -2% -77% -78% /$re/ start 13.6/s 2% -- -77% -78% /$re/o start 58.2/s 334% 328% -- -6% /abc/o start 62.2/s 364% 357% 7% --You can easily see that the tests are falling into two categories, the ones with
/.../oin the source, and the ones without. Since this is a syntatic difference, it gives you clues that it is probably a case that the compiler is optimizing (or that the runtime is allowed to cache in some way). (removing checks of the variables after they have been done once, simplifying the stack, it is hard to say without looking at the source).The results are probably also dependent on the version of perl being used. The above tests are run on v5.10.1