I am an absolute newbie to Perl as well as programming in general(less than

Question

0

Editorial Team

Asked: June 6, 20262026-06-06T02:48:41+00:00 2026-06-06T02:48:41+00:00

I am an absolute newbie to Perl as well as programming in general(less than

0

I am an absolute newbie to Perl as well as programming in general(less than a month’s experience).

I am stumped with a problem which needs to be resolved if I am to solve a bigger issue.

Basically, I have 2 arrays which look like this:

@array1 = ('NM_1234' , '1452' , 'NM_345' , '5008' , 'NR_6145' , '256');
@array2 = ('NM_5673' , '2' , 'NM_345' , '5' , 'NR_6145' , '10');

@array1 contains id numbers followed by length. The id number is of nucleotide sequences and length is the length of the sequence.

@array2 contains id numbers followed by the number of G-Quadruplex structures in each so some sequences contain only 2 such structures while others contain 10 or more.

The basic problem is, I need to add to @array2, the “length numbers” in @array1(eg 5008, 256) for every matching id number.

So for example as NM_345 matches in both the arrays, I need to add 5008 to it, so that the final result becomes like NM_345,5,5008.

Similarly with NR_6145 and other such matches ( There are over 20,000 id numbers in @array2)

So far, I have been able to write code which can just search for the same id number in both the arrays. Here is the code:

#Enter file name
print "Enter file name: ";
$in =<>;
chomp $in;

open(FASTA,"$in") or die;

@data = <FASTA>; #Read in data        
$data = join ('',@data); #Convert to string
@data2 = split('\n',$data); #Explode along newlines

#Enter 2nd file name
print "\n\nEnter 2nd file name: ";
$in2=<>;
chomp $in2;

open(FASTA,"$in2") or die;
@entry =<FASTA>; #Read in data

$entry = join('',@entry); #Convert to string
@entry2 = split('\n',$entry); #Explode along newlines

my %seen;
for  $item (@data2) {
    if($item =~ /([0-9]+)/){
        push @{$seen{$key}}, $item;#WHAT IS THIS DOING? HOW?
    }
}

for my $item (@entry2) {
    if ($item =~ /([0-9]+)/){
        if (exists $seen{$key}) {
            print $item,"\n";
        };        
    }
}
exit;

I derived the code which finds the same element from 2 arrays from this solution here, so full credit goes to Chas.Owens: https://stackoverflow.com/a/1064929/1468737.
And of course, I do not quite yet understand this part:

push @{$seen{$key}}, $item;#WHAT IS THIS DOING? HOW?

It appears to be an array of a hash value or something?

So , now how do I add the length element from @array1 into @array2? I need to use the splice command I think, but how?

My desired output should look like this:

NM_345,5,5008 <br>
NM_6145,10,256<br>
etc

I also need to save this output into a file which will then later be analyzed to see if there is any correlation between length and G-quadruplex number.

Any help or input will be deeply appreciated.

Thank you for taking the time to go through my problem!

EDIT: This edit is to show how the data files look like. They are basically putput files from other programs I wrote.

My first file,named, Transcriptlength.fa, with over 40,000 id numbers going into @array1 looks like this:

My second file,named Quadcount.AllGtranscripts.fa, with over 20,000id numbers going into @array2, looks like this:

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T02:48:44+00:00

It looks as though you are having trouble reading the data files as well as generating the output you want. We cannot help with that part of the problem unless you show us an example of the file data, but here is a solution for producing the output correctly.

It is best if your data is stored in hashes as that allows direct access to the length and structure count for a given sequence ID. Fortunately, arrays laid out as you have described them can easily be converted to hashes by a simple assignment, so this short program does what you want from the arrays you show.

The grep /\D/, @array2 list in the loop just selects all the sequence IDs from @array2 by picking only those elements that contain a non-decimal character. I have done it this way in case the order in which the sequences are displayed matters. In your final program you should probably process the data directly from the file instead of reading it into an array so this won’t be an issue.

use strict;
use warnings;

my @array1 = ( NM_1234 1452   NM_345 5008   NR_6145 256 );
my @array2 = ( NM_5673    2   NM_345    5   NR_6145  10 );

my %lengths = @array1;
my %counts = @array2;

for my $id (grep /\D/, @array2) {
  my $length = $lengths{$id};
  printf "%s,%s,%s\n", $id, $length, $counts{$id} if $length;
}

output

NM_345,5008,5
NR_6145,256,10

Update

Your file data is ideal for setting paragraph mode where records are separated by blank lines in the data file. To achieve this you set the input record separator variable $/ to an empty string "".

This revised program reads records from the first file, splits them on whitespace (whitespace includes space, tab and newline, amongst others) and builds a hash %lengths which relates each sequence ID to its length.

The same is done to the second file, this time checking whether the sequence ID appears in the hash. If so the complete record is output.

use strict;
use warnings;

my $fh;
my %lengths;

$/ = "";

open $fh, '<', 'Transcriptlength.fa'
    or die qq(Unable to open "Transcriptlength.fa": $!);

while (<$fh>) {

  my ($id, $length) = split;
  next unless $id;

  $lengths{$id} = $length;
}

open $fh, '<', 'Quadcount.AllGtranscripts.fa'
    or die qq(Unable to open "Quadcount.AllGtranscripts.fa": $!);

while (<$fh>) {

  my ($id, $count) = split;
  next unless $id;

  my $length = $lengths{$id};
  next unless $length;

  print join(',', $id, $count, $length), "\n";
}

unfortunately the sample data that you have chosen doesn’t contain matching sequence IDs so there is no output from this program when run against that data. Your actual files will be more productive.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am an absolute newbie to Perl as well as programming in general(less than

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply