Several times I’ve read that unpack() is faster than substr(), especially as the number of substrings increases. However, this benchmark suggests otherwise. Is my benchmark flawed, or is the alleged performance advantage of unpack() a holdover from older versions of Perl?
use strict;
use warnings;
use Benchmark;
my ($data, $format_string, $n_substrings);
my %methods = (
unpack => sub { return unpack $format_string, $data },
substr => sub { return map {substr $data, $_, 1} 0 .. $n_substrings - 1 },
);
for my $exp (1 .. 5){
$n_substrings = 10 ** $exp;
print $n_substrings, "\n";
$format_string = 'a1' x $n_substrings;
$data = 9 x $n_substrings;
Benchmark::cmpthese -2, \%methods;
}
Output (on Windows):
10
Rate unpack substr
unpack 131588/s -- -52%
substr 276802/s 110% --
100
Rate unpack substr
unpack 13660/s -- -57%
substr 31636/s 132% --
1000
Rate unpack substr
unpack 1027/s -- -68%
substr 3166/s 208% --
10000
Rate unpack substr
unpack 84.4/s -- -74%
substr 322/s 281% --
100000
Rate unpack substr
unpack 5.46/s -- -82%
substr 30.1/s 452% --
As pointed out in some answers, unpack() does poorly on Windows. Here’s the output on a solaris machine — not nearly so decisive, but substr() still wins the foot race:
10
Rate unpack substr
unpack 202274/s -- -4%
substr 210818/s 4% --
100
Rate unpack substr
unpack 22015/s -- -9%
substr 24322/s 10% --
1000
Rate unpack substr
unpack 2259/s -- -9%
substr 2481/s 10% --
10000
Rate unpack substr
unpack 225/s -- -9%
substr 247/s 9% --
100000
Rate unpack substr
unpack 22.0/s -- -10%
substr 24.4/s 11% --
Since asking this question, I have benchmarked
substragainstunpackseveral more times, under various conditions. Here are a few things I’ve learned:Do not set up the benchmark in a way
that calls Perl functions in void
context (as I did in my original question; see the
helpful response from dlowe).
Some Perl functions have
optimizations when they are called in
void context (and these optimizations
appear to vary by OS), potentially
skewing your benchmarking results.
If your use of
substrinvolveslooping (for example, iterating over
a list of column locations),
unpackis always faster. However, the
apparent slowness of
substrin thissituation is due to the overhead of
the loop, not
substritself.If just a few fields are required,
substris generally faster or asfast as
unpack.If more than a few fields are
required, head-to-head comparisons
between
unpackand an equivalentnumber of
substrcalls do not varymuch as the number of fields
increases: both approaches become
slower at the same rate.
Results can vary by operating system.
On my Windows XP machine,
unpackhad a slight edge whenever more than
a few fields were needed. On our
Solaris machines at my workplace,
substrwas always faster, even intohundreds of fields.
Bottom line: the performance of
unpackvs.substris not a very big issue, regardless of the number of fields. Use whichever approach results in the clearest code. If you find yourself usingsubstrin a looping construct, however, switching tounpackwill result in a noteworthy speed boost.