I have 5 files containing the same words. I want to read each word

Question

0

Asked: May 18, 20262026-05-18T22:35:36+00:00 2026-05-18T22:35:36+00:00

I have 5 files containing the same words. I want to read each word

0

I have 5 files containing the same words. I want to read each word in all the files and decide the winning word by detecting the following characters in a word (*, #, $, &) separated by tabs. Then, I want to generate an output file. Ii can only have 2 winners. For example:

file1

    we$
    are*
    ...

file2

    we$
    are#
    ...

file3

    we&
    are*
    ...

file4

    we$
    are#
    ...

file5

    we$
    are&
    ...

output file:

we$                       
are*#

Here is how I started:

#!/usr/local/bin/perl -w

sub read_file_line {
  my $fh = shift;    
  if ($fh and my $line = <$fh>) {    
    chomp($line);    
    return $line;
  }    
  return;    
}

open(my $f1, "words1.txt") or die "Can't";
open(my $f2, "words2.txt") or die "Can't";
open(my $f3, "words3.txt") or die "Can't";
open(my $f4, "words4.txt") or die "Can't";
open(my $f5, "words5.txt") or die "Can't";

my $r1 = read_file_line($f1);
my $r2 = read_file_line($f2);
my $r3 = read_file_line($f3);
my $r4 = read_file_line($f4);
my $r5 = read_file_line($f5);

while ($f5) {

    #What can I do here to decide and write the winning word in the output file?

$r1 = read_file_line($f1);
$r2 = read_file_line($f2);
$r3 = read_file_line($f3);
$r4 = read_file_line($f4);
$r5 = read_file_line($f5);
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T22:35:37+00:00

Test Data Generator

#!/usr/bin/env perl

use strict;
use warnings;

foreach my $i (1..5)
{
    my $file = "words$i.txt";
    open my $fh, '>', $file or die "Failed to open $file for writing ($!)";
    foreach my $w (qw (we are the people in charge and what we say goes))
    {
        my $suffix = substr('*#$&', rand(4), 1);
        print $fh "$w$suffix\n";
    }
}

Majority Voting Code

#!/usr/bin/env perl

use strict;
use warnings;

my @files = ( "words1.txt", "words2.txt", "words3.txt",
              "words4.txt", "words5.txt"
            );

my @fh;
{
    my $n = 0;
    foreach my $file (@files)
    {
        open my $f, '<', $file or die "Can't open $file for reading ($!)";
        $fh[$n++] = $f;
    }
}

while (my $r = process_line(@fh))
{
    print "$r\n";
}

sub process_line
{
    my(@fhlist) = @_;
    my %words = ();
    foreach my $fh (@fhlist)
    {
        my $line = <$fh>;
        return unless defined $line;
        chomp $line;
        $words{$line}++;
    }

    my $combo = '';
    foreach my $word (keys %words)
    {
        return $word    if ($words{$word} >  2);
        $combo .= $word if ($words{$word} == 2);
    }
    $combo =~ s/(\W)\w+(\W)/$1$2/;
    return $combo;
}

Example Data and Results

$ perl datagenerator.pl
$ perl majorityvoter.pl > results.txt
$ paste words?.txt results.txt
we*     we$     we&     we#     we#     we#
are*    are#    are#    are*    are$    are*#
the*    the&    the#    the#    the&    the&#
people& people& people$ people# people# people&#
in#     in*     in$     in*     in*     in*
charge* charge# charge& charge* charge# charge#*
and$    and*    and$    and&    and$    and$
what&   what&   what$   what&   what#   what&
we#     we*     we*     we&     we*     we*
say$    say&    say$    say$    say$    say$
goes$   goes&   goes#   goes#   goes#   goes#
$

This seems to be correct for the test data in the files generated.

Revised requirements – example output

The ‘revised requirements’ replaced the ‘*#$&’ markers after the words with a tab and one of the letters ‘ABCD’. After some swift negotiation, the question is restored to its original form. This output is from a suitably adapted version of the answer above – 3 code lines changed, 2 in the data generator, 1 in the majority voter. Those changes are not shown – they are trivial.

we      C       we      D       we      C       we      C       we      D       we      C
are     C       are     D       are     C       are     B       are     A       are     C
the     B       the     D       the     A       the     A       the     D       the     A|D
people  D       people  B       people  A       people  B       people  D       people  B|D
in      D       in      B       in      C       in      B       in      D       in      D|B
charge  C       charge  D       charge  D       charge  D       charge  A       charge  D
and     A       and     B       and     C       and     C       and     B       and     B|C
what    B       what    B       what    B       what    C       what    C       what    B
we      D       we      B       we      D       we      B       we      A       we      B|D
say     D       say     D       say     B       say     D       say     D       say     D
goes    A       goes    C       goes    A       goes    C       goes    A       goes    A

Revised test generator – for configurable number of files

Now that the poster has worked out how to handle the revised scenario, this is the data generator code I used – with 5 tags (A-E). Clearly, it would not take a huge amount of work to configure the number of tags on the command line.

#!/usr/bin/env perl

use strict;
use warnings;

my $fmax  = scalar(@ARGV) > 0 ? $ARGV[0] : 5;
my $tags  = 'ABCDE';
my $ntags = length($tags);
my $fmt   = sprintf "words$fmax-%%0%0dd.txt", length($fmax);

foreach my $fnum (1..$fmax)
{
    my $file = sprintf $fmt, $fnum;
    open my $fh, '>', $file or die "Failed to open $file for writing ($!)";
    foreach my $w (qw(We Are The People In Charge And What We Say Goes))
    {
        my $suffix = substr($tags, rand($ntags), 1);
        print $fh "$w\t$suffix\n";
    }
}

Revised Majority Voting Code – for arbitrary number of files

This code works with basically arbitrary numbers of files. As noted in one of the (many) comments, it does not check that the word is the same in each file as required by the question; you could get quirky results if the words are not the same.

#!/usr/bin/env perl

use strict;
use warnings;

my @files = scalar @ARGV > 0 ? @ARGV :
            ( "words1.txt", "words2.txt", "words3.txt",
              "words4.txt", "words5.txt"
            );
my $voters = scalar(@files);

my @fh;
{
    my $n = 0;
    foreach my $file (@files)
    {
        open my $f, '<', $file or die "Can't open $file for reading ($!)";
        $fh[$n++] = $f;
    }
}

while (my $r = process_line(@fh))
{
    print "$r\n";
}

sub process_line
{
    my(@fhlist) = @_;
    my %words = ();
    foreach my $fh (@fhlist)
    {
        my $line = <$fh>;
        return unless defined $line;
        chomp $line;
        $words{$line}++;
    }
    return winner(%words);
}

# Get tag X from entry "word\tX".
sub get_tag_from_word
{
    my($word) = @_;
    return (split /\s/, $word)[1];
}

sub winner
{
    my(%words)   = @_;
    my $maxscore = 0;
    my $winscore = ($voters / 2) + 1;
    my $winner   = '';
    my $taglist  = '';
    foreach my $word (sort keys %words)
    {
        return "$word\t$words{$word}" if ($words{$word} >= $winscore);
        if ($words{$word} > $maxscore)
        {
            $winner = $word;
            $winner =~ s/\t.//;
            $taglist = get_tag_from_word($word);
            $maxscore = $words{$word};
        }
        elsif ($words{$word} == $maxscore)
        {
            my $newtag = get_tag_from_word($word);
            $taglist .= "|$newtag";
        }
    }
    return "$winner\t$taglist\t$maxscore";
}

One Example Run

After considerable experimentation on the data presentation, one particular set of data I generated gave the result:

We          A|B|C|D|E   2  B  C  C  E  D  A  D  A  E  B
Are         D           4  C  D  B  A  D  B  D  D  B  E
The         A           5  D  A  B  B  A  A  B  E  A  A
People      D           4  E  D  C  D  B  E  D  D  B  C
In          D           3  E  C  D  D  D  B  C  A  A  B
Charge      A|E         3  E  E  D  A  D  A  B  A  E  B
And         E           3  C  E  D  D  C  A  B  E  B  E
What        A           5  B  C  C  A  A  A  B  A  D  A
We          A           4  C  A  A  E  A  E  C  D  A  E
Say         A|D         4  A  C  A  A  D  E  D  A  D  D
Goes        A           3  D  B  A  C  C  A  A  E  E  B

The first column is the word; the second is the winning tag or tags; the third (numeric) column is the maximum score; the remaining 10 columns are the tags from the 10 data files. As you can see, there two each of ‘We A’, ‘We B’, … ‘We E’ in the first row. I’ve also generated (but not preserved) one result set where the maximum score was 7. Given enough repetition, these sorts of variations are findable.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have 5 files containing the same words. I want to read each word

Leave an answerCancel reply

1 Answer

Test Data Generator

Majority Voting Code

Example Data and Results

Revised requirements – example output

Revised test generator – for configurable number of files

Revised Majority Voting Code – for arbitrary number of files

One Example Run

Leave an answer
Cancel reply