Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8113387
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T02:48:41+00:00 2026-06-06T02:48:41+00:00

I am an absolute newbie to Perl as well as programming in general(less than

  • 0

I am an absolute newbie to Perl as well as programming in general(less than a month’s experience).

I am stumped with a problem which needs to be resolved if I am to solve a bigger issue.

Basically, I have 2 arrays which look like this:

@array1 = ('NM_1234' , '1452' , 'NM_345' , '5008' , 'NR_6145' , '256');
@array2 = ('NM_5673' , '2' , 'NM_345' , '5' , 'NR_6145' , '10');

@array1 contains id numbers followed by length. The id number is of nucleotide sequences and length is the length of the sequence.

@array2 contains id numbers followed by the number of G-Quadruplex structures in each so some sequences contain only 2 such structures while others contain 10 or more.

The basic problem is, I need to add to @array2, the “length numbers” in @array1(eg 5008, 256) for every matching id number.

So for example as NM_345 matches in both the arrays, I need to add 5008 to it, so that the final result becomes like NM_345,5,5008.

Similarly with NR_6145 and other such matches ( There are over 20,000 id numbers in @array2)

So far, I have been able to write code which can just search for the same id number in both the arrays. Here is the code:

#Enter file name
print "Enter file name: ";
$in =<>;
chomp $in;

open(FASTA,"$in") or die;

@data = <FASTA>; #Read in data        
$data = join ('',@data); #Convert to string
@data2 = split('\n',$data); #Explode along newlines

#Enter 2nd file name
print "\n\nEnter 2nd file name: ";
$in2=<>;
chomp $in2;

open(FASTA,"$in2") or die;
@entry =<FASTA>; #Read in data

$entry = join('',@entry); #Convert to string
@entry2 = split('\n',$entry); #Explode along newlines

my %seen;
for  $item (@data2) {
    if($item =~ /([0-9]+)/){
        push @{$seen{$key}}, $item;#WHAT IS THIS DOING? HOW?
    }
}

for my $item (@entry2) {
    if ($item =~ /([0-9]+)/){
        if (exists $seen{$key}) {
            print $item,"\n";
        };        
    }
}
exit;

I derived the code which finds the same element from 2 arrays from this solution here, so full credit goes to Chas.Owens: https://stackoverflow.com/a/1064929/1468737.
And of course, I do not quite yet understand this part:

push @{$seen{$key}}, $item;#WHAT IS THIS DOING? HOW?

It appears to be an array of a hash value or something?

So , now how do I add the length element from @array1 into @array2? I need to use the splice command I think, but how?

My desired output should look like this:

NM_345,5,5008 <br>
NM_6145,10,256<br>
etc

I also need to save this output into a file which will then later be analyzed to see if there is any correlation between length and G-quadruplex number.

Any help or input will be deeply appreciated.

Thank you for taking the time to go through my problem!


EDIT: This edit is to show how the data files look like. They are basically putput files from other programs I wrote.

My first file,named, Transcriptlength.fa, with over 40,000 id numbers going into @array1 looks like this:

NR_037701
3353

NM_198399
2414

NR_026816
601

NR_027917
658

NR_002777
1278

My second file,named Quadcount.AllGtranscripts.fa, with over 20,000id numbers going into @array2, looks like this:

NM_000014   
1

NM_000016   
3

NM_000017   
19

NM_000018   
2

NM_000019   
3

NM_000020   
30

NM_000021   
1

NM_000022   
2

NM_000023   
5

NM_000024   
1

NM_000025   
15

NM_000029   
5
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T02:48:44+00:00Added an answer on June 6, 2026 at 2:48 am

    It looks as though you are having trouble reading the data files as well as generating the output you want. We cannot help with that part of the problem unless you show us an example of the file data, but here is a solution for producing the output correctly.

    It is best if your data is stored in hashes as that allows direct access to the length and structure count for a given sequence ID. Fortunately, arrays laid out as you have described them can easily be converted to hashes by a simple assignment, so this short program does what you want from the arrays you show.

    The grep /\D/, @array2 list in the loop just selects all the sequence IDs from @array2 by picking only those elements that contain a non-decimal character. I have done it this way in case the order in which the sequences are displayed matters. In your final program you should probably process the data directly from the file instead of reading it into an array so this won’t be an issue.

    use strict;
    use warnings;
    
    my @array1 = ( NM_1234 1452   NM_345 5008   NR_6145 256 );
    my @array2 = ( NM_5673    2   NM_345    5   NR_6145  10 );
    
    my %lengths = @array1;
    my %counts = @array2;
    
    for my $id (grep /\D/, @array2) {
      my $length = $lengths{$id};
      printf "%s,%s,%s\n", $id, $length, $counts{$id} if $length;
    }
    

    output

    NM_345,5008,5
    NR_6145,256,10
    

    Update

    Your file data is ideal for setting paragraph mode where records are separated by blank lines in the data file. To achieve this you set the input record separator variable $/ to an empty string "".

    This revised program reads records from the first file, splits them on whitespace (whitespace includes space, tab and newline, amongst others) and builds a hash %lengths which relates each sequence ID to its length.

    The same is done to the second file, this time checking whether the sequence ID appears in the hash. If so the complete record is output.

    use strict;
    use warnings;
    
    my $fh;
    my %lengths;
    
    $/ = "";
    
    open $fh, '<', 'Transcriptlength.fa'
        or die qq(Unable to open "Transcriptlength.fa": $!);
    
    while (<$fh>) {
    
      my ($id, $length) = split;
      next unless $id;
    
      $lengths{$id} = $length;
    }
    
    open $fh, '<', 'Quadcount.AllGtranscripts.fa'
        or die qq(Unable to open "Quadcount.AllGtranscripts.fa": $!);
    
    while (<$fh>) {
    
      my ($id, $count) = split;
      next unless $id;
    
      my $length = $lengths{$id};
      next unless $length;
    
      print join(',', $id, $count, $length), "\n";
    }
    

    unfortunately the sample data that you have chosen doesn’t contain matching sequence IDs so there is no output from this program when run against that data. Your actual files will be more productive.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm an absolute newbie to Varnish & caching in general, so this question might
I am completely new to Perl, like absolute newbie. I am trying to develop
I am an absolute newbie at ios programming so forgive me: I have a
Absolute newbie question, any help is highly appreciated :) I am using curvycorners (
I am an absolute newbie in Haskell yet trying to understand how it works.
I am an absolute objective-c, c, and openGL newbie. Hence, when I found coco2d
NEWBIE ALERT! background: For the first time, I am writing a model that needs
I'm an absolute newbie to Moose and so far I have read Moose and
I'm an absolute newbie in this field and I'm kind of frightened of heading
In which absolute position of an object I clicked? I tested with: object.onclick=function(e){ var

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.