Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7923071
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T17:11:02+00:00 2026-06-03T17:11:02+00:00

I’m having some trouble manipulating an array of DNA sequence data that is in

  • 0

I’m having some trouble manipulating an array of DNA sequence data that is in .fasta format. What I would specifically like to do is take a file that has a few thousand sequences and adjoin sequence data for each sequence in the file onto a single line in the file. [Fasta format is as such: A sequence ID starts with > after which everything on that line is a description. On the next line(s) the sequence corresponding to this ID is present. And this can continue indefinitely until the next line that begins with >, which is the id of the next sequence in the file] So, in my particular file most of my sequences are on multiple lines, so what I would like to do is essentially remove the newlines, but only the new lines between sequence data, not between sequence data and sequence ID lines (that start with >).

I’m doing this because I want to be able to attain sequence lengths of each sequence (through length, I believe is the easiest way), and then get an average sequence length of all the sequences in the whole file.

Here’s my script so far, that doesnt seem to want to work:

#!/usr/bin/perl -w


##Subroutine
sub get_file_data1 { 
    my($filename) = $_[0];
    my @filedata = ();
    unless( open(GET_FILE_DATA, $filename)) {
    print STDERR "Cannot open file \"$filename\"\n\n";
    exit;
    }
    @filedata = <GET_FILE_DATA>;
    close GET_FILE_DATA;
    return @filedata;
}



##Opening files
my $fsafile = $ARGV[0];
my @filedata = &get_file_data1($fsafile);


##Procedure
my @count;
my @ids;
my $seq;

foreach $seq (@filedata){
        if ($seq =~ /^>/) {push @ids, $seq;
                                 push @count, "\n";
    }
        else {push @count, $seq;
    }
}


foreach my $line (@count) {
    if ($line =~ /^[AGTCagtc]/){
         $line =~ s/^([AGTCagtc]*)\n/$1/;
    }
}

##Make a text file to have a look
open FILE3, "> unbrokenseq.txt" or die "Cannot open output.txt: $!";

foreach (@count)
{
    print FILE3 "$_\n"; # Print each entry in our array to the file
}
close FILE3;


__END__
##Creating array of lengths
my $number;
my @numberarray;
foreach $number (@count) {
                push @numberarray, length($number);
                }
print @numberarray;


__END__
use List::Util qw(sum);

sub mean {
    return sum(@numberarray)/@numberarray;
}

There’s something wrong with the second foreach line of the Procedure section and I can’t seem to figure out what it is. Note that the code after the END lines I haven’t even tried yet because I cant seem to get the code in the procedure step to do what I want. Any idea how I can get a nice array with elements of unbroken sequence (I’ve chosen to just remove the sequence ID lines from the new array..)? When I can then get an array of lengths, after which I can then average?

Finally I should unfortunately admit that I cannot get Bio::Perl working on my computer, I have tried for hours but the errors are beyond my skill to fix. Ill be talking to someone who can hopefully help me with my Bio::perl issues. But for now I’m just going to have to press on without it.

Thanks! Sorry for the length of this post, I appreciate the help.

Andrew

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T17:11:03+00:00Added an answer on June 3, 2026 at 5:11 pm

    The problem with your second loop is that you are not actually changing anything in @count because $line contains a copy of the values in @count.

    But, if all you want to do in the second loop is to remove the newline character at the end, use the chomp function. with this you wouldn’t need your second loop. (And it would also be faster than using the regex.)

    # remove newlines for all array elements before doing anything else with it
    chomp @filedata;
    
    # .. or you can do it in your first loop
    foreach $seq (@filedata){
        chomp $seq;
        if ($seq =~ /^>/) {
        ...
    }
    

    An additional tip: Using get_file_data1 to read the entire file into an array might be slow if your files are large. In that case it would be better to iterate through the file as you go:

    open my $FILE_DATA, $filename or die "Cannot open file \"$filename\"\n";
    while (my $line = <$FILE_DATA>) {
        chomp $line;
        # process the record as in your Procedure section
        ...
    }
    close $FILE_DATA;
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have some data like this: 1 2 3 4 5 9 2 6
I would like to count the length of a string with PHP. The string
For some reason, after submitting a string like this Jack’s Spindle from a text
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I've got a string that has curly quotes in it. I'd like to replace
I would like to run a str_replace or preg_replace which looks for certain words
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I'm having trouble keeping the paragraph square between the quote marks. In firefox the
That's pretty much it. I'm using Nokogiri to scrape a web page what has

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.