Possible Duplicate:
In Perl, is there a built in way to compare two arrays for equality?
I need to compare arrays with a function that should return:
- true if all elements are equal when compared pairwise
- true if all elements are equal or the element in the first array is undefined when compared pairwise
- false in all other cases
in other words, if the sub is called “comp”:
@a = ('a', 'b', undef, 'c');
@b = ('a', 'b', 'f', 'c');
comp(@a, @b); # should return true
comp(@b, @a); # should return false
@a = ('a', 'b');
@b = ('a', 'b', 'f', 'c');
comp(@a, @b); # should return true
the obvious solution would be to do pairwise compares between the two arrays, but I’d like it to be faster than that, as the comparisons are run multiple times over a large set of arrays, the and the arrays may have many elements.
On the other hand, the contents of the arrays to be compared (i.e.: all the possible @b’s) is pre-determined and does not change. The elements of the arrays do not have a fixed length, and there is no guarantee as to what chars they might contain (tabs, commas, you name it).
Is there a faster way to do this than pairwise comparison? Smart match won’t cut it, as it returns true if all elements are equal (an therefore not if one is undef).
Could packing and doing bitwise comparisons be a strategy? It looks promising when I browse the docs for pack/unpack and vec, but I’m somewhat out of my depth there.
Thanks.
Perl can compare lists of 10,000 pairwise elements in about 100ms on my Macbook, so first thing I’ll say is to profile your code to make sure this is actually the problem.
Doing some benchmarking, there’s a few things you can do to speed things up.
Assuming you have a lot of comparisons which don’t match, this will save HEAPS of time.
If they arrays aren’t the same length, they can never match. Compare their sizes and return early if they’re different. This avoids needing to check this case over and over again inside the loop.
Iterating pair-wise you’d normally do something like
for( my $idx = 0; $idx <= $#a; $idx += 2 )but iterating over an array is faster than using a C-style for loop. This is an optimization trick of Perl, its more efficient to do the work inside perl in optimized C than to do it in Perl code. This gains you about 20%-30% depending on how you micro-optimize it.Since one set of pairs is fixed, you can produce an index of which pairs are defined. This makes the iterator even simpler and faster.
With no nulls this is a performance boost of 40%. You get more beyond that the more nulls are in your fixed set.
I’m still convinced this is better to do in SQL with a self-join, but haven’t worked that out.