Some days ago I asked a question about tagging differencies in 2 text files,

Question

0

Asked: June 6, 20262026-06-06T03:30:18+00:00 2026-06-06T03:30:18+00:00

Some days ago I asked a question about tagging differencies in 2 text files,

0

Some days ago I asked a question about tagging differencies in 2 text files, and was answered quickly.

now I have a rather similar question but a bit more complicated.
I have 2 pair of files by the following characteristics:
pair1: (File1.txt , File2.txt)
pair2: (File3.txt , File4.txt)

There is a line by line correspondence between each files in these pairs. say that File1.txt and File3.txt are some English words, and File2.txt and File4.txt are their Arabic and French translations respectively. In addition, File1.txt and File3.txt are very similar (and in some cases the same).


    File1.txt       File2.txt
    EnWord1         ArTrans1
    EnWord2         ArTrans2
    EnWord3         ArTrans3
    Enword4         ArTrans4

    File3.txt       File4.txt
    EnWord1         FrTrans1
    EnWord3         FrTrans3
    Enword4         FrTrans4
    Enword5         FrTrans5

Now what I want to do is to compare English sides of these pairs, find the common words that appear in both files (EnWord1,EnWord3, and EnWord4) and filter out their corresponding translations.
In short, I can say that using two bilingual English-Arabic and English French dictionaries, I am trying to build a 3-lingual English-Arabic-French dictionary.
How it is possible?

I have to add that since there are many such pairs (the dictionaries are stored in different files, each file contains a part of the words, and by some reasons it is not possible to merge files and then process them) the speed of the code is very important and I am looking for a fast way to do this.

Finally, please give me some points (or even possible the complete code) to do this in Perl.

Best regards,
Hakim

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T03:30:19+00:00

I assume that the order you would like to maintain follows File1.txt. The following perl should accomplish what your looking for:

#!/usr/bin/perl

use strict;
use warnings;

my @pair1 = `paste -d ":" $ARGV[0] $ARGV[1]`;
my @pair2 = `paste -d ":" $ARGV[2] $ARGV[3]`;

my @pairs = (@pair1, @pair2);
my (%seen, @dups);

foreach (@pairs)
{
  my $word = (split ":", $_)[0];
  push @dups, $word if $seen{$word}++;
}

open (FILE0, ">", "NEW_File0.txt") or die;
open (FILE1, ">", "NEW_File1.txt") or die;
open (FILE2, ">", "NEW_File2.txt") or die;

foreach my $duplicate (@dups)
{
  print FILE0 "$duplicate\n";

  foreach (@pair1) { print FILE1 ((split ":", $_)[1]) if $_ =~ /^$duplicate:/; }
  foreach (@pair2) { print FILE2 ((split ":", $_)[1]) if $_ =~ /^$duplicate:/; }
}

close FILE0;
close FILE1;
close FILE2;

Run like this:

./script.pl File1.txt File2.txt File3.txt File4.txt

grep "" NEW_File* results:

NEW_File0.txt:EnWord1
NEW_File0.txt:EnWord3
NEW_File0.txt:EnWord4
NEW_File1.txt:ArTrans1
NEW_File1.txt:ArTrans3
NEW_File1.txt:ArTrans4
NEW_File2.txt:FrTrans1
NEW_File2.txt:FrTrans2
NEW_File2.txt:FrTrans3

May not be the most efficient way to do things, but should give you somewhere to start at least. HTH.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Some days ago I asked a question about tagging differencies in 2 text files,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply