I have 5 files containing the same words. I want to read each word in all the files and decide the winning word by detecting the following characters in a word (*, #, $, &) separated by tabs. Then, I want to generate an output file. Ii can only have 2 winners. For example:
file1
we$
are*
...
file2
we$
are#
...
file3
we&
are*
...
file4
we$
are#
...
file5
we$
are&
...
output file:
we$
are*#
Here is how I started:
#!/usr/local/bin/perl -w
sub read_file_line {
my $fh = shift;
if ($fh and my $line = <$fh>) {
chomp($line);
return $line;
}
return;
}
open(my $f1, "words1.txt") or die "Can't";
open(my $f2, "words2.txt") or die "Can't";
open(my $f3, "words3.txt") or die "Can't";
open(my $f4, "words4.txt") or die "Can't";
open(my $f5, "words5.txt") or die "Can't";
my $r1 = read_file_line($f1);
my $r2 = read_file_line($f2);
my $r3 = read_file_line($f3);
my $r4 = read_file_line($f4);
my $r5 = read_file_line($f5);
while ($f5) {
#What can I do here to decide and write the winning word in the output file?
$r1 = read_file_line($f1);
$r2 = read_file_line($f2);
$r3 = read_file_line($f3);
$r4 = read_file_line($f4);
$r5 = read_file_line($f5);
}
Test Data Generator
Majority Voting Code
Example Data and Results
This seems to be correct for the test data in the files generated.
Revised requirements – example output
The ‘revised requirements’ replaced the ‘*#$&’ markers after the words with a tab and one of the letters ‘ABCD’. After some swift negotiation, the question is restored to its original form. This output is from a suitably adapted version of the answer above – 3 code lines changed, 2 in the data generator, 1 in the majority voter. Those changes are not shown – they are trivial.
Revised test generator – for configurable number of files
Now that the poster has worked out how to handle the revised scenario, this is the data generator code I used – with 5 tags (A-E). Clearly, it would not take a huge amount of work to configure the number of tags on the command line.
Revised Majority Voting Code – for arbitrary number of files
This code works with basically arbitrary numbers of files. As noted in one of the (many) comments, it does not check that the word is the same in each file as required by the question; you could get quirky results if the words are not the same.
One Example Run
After considerable experimentation on the data presentation, one particular set of data I generated gave the result:
The first column is the word; the second is the winning tag or tags; the third (numeric) column is the maximum score; the remaining 10 columns are the tags from the 10 data files. As you can see, there two each of ‘We A’, ‘We B’, … ‘We E’ in the first row. I’ve also generated (but not preserved) one result set where the maximum score was 7. Given enough repetition, these sorts of variations are findable.