I have this method of sorting which is basically just basic thought processes, not using Perl power, and once in a while it doesn’t act how I want it (misses some frequency counting). I was wondering if there was a better way to sort this.
Objective Sort the array based on frequency of matches found.
Sample array of arrays
##ADDED 1 to END of EACH ROW, just because my sort forced me too!!!
my @all_matches = (["chpt10_2", "sent. 2", "alice", "nsubj", "animals", "protect"],
["chpt12_1", "sent. 54", "bob", "nsubj", "cells", "protect"],
["chpt25_4", "sent. 47", "carol", "nsubj", "plants", "protect"],
["chpt34_1", "sent. 1", "dave", "nsubj", "cells", "protect"],
["chpt35_1", "sent. 2", "eli", "nsubj", "cells", "protect"],
["chpt38_1", "sent. 1", "fred", "nsubj", "animals", "protect"],
["chpt54_1", "sent. 1", "greg", "nsubj", "uticle", "protect"]
);
Current sort
@all_matches = sort {lc($a->[4]) cmp lc($b->[4])} @all_matches;
my ($last_word, $current_word, $word_count);
for my $j (0 .. $#all_matches) {
$current_word = $all_matches[$j][4];
if (lc($last_word) eq lc($current_word)) {
$word_count++;
}
else {
if ($j != 0)
{
for (my $k = 1; $k <= $word_count; $k++)
{
$all_matches[($j-$k)][6] = $word_count;
}
}
$last_word = $current_word;
$word_count = 1;
}
}
@all_matches = sort {$b->[6] <=> $a->[6] || lc($a->[4]) cmp lc($b->[4])} @all_matches;
Problem The 6th column is set to 1 when all_matches is passed in!!! The reason this was done was because sometimes, the count ($match->[6]) was blank.
Bonus? Match frequency of times the last two columns appear together (right now I’m pretty sure it just checks 2nd last column). In this test case, the final column is all the same, in the actual case, there are different suffixes on the end (ie. protect, protects, protective etc..)
THANKS a lot for your time. I’ve tried using a hash, and thought it worked, however it neglected some things.
Here was my hash attempt. Couldn’t tell you yet why this didn’t work:
my %freq;
foreach ( map{$_->[4]}@results) #feeds in list of animals, cells, uticle, etc.
{
$freq{lc $_}++;
}
@results = sort {$freq{lc $b->[4]} <=> $freq{lc $a->[4]} #freq order
or
$a->[0] cmp $b->[0] #text col 0
} @results;
Why not create a hash of the keys with a count of the occurrences, and use that:
Tested…
Output:
As noted in a comment, given the data shown, the
lcoperations are not needed – and removing them would improve performance, as would adding a case-converted key to each array.And with
lcused once per row – notice the munged data values:Output: