I found that if my fasta file ends with a single line sequence then that sequence returned by Bioperl will have one nucleotide missing. If fasta file ends with the new line then it returns complete sequence. Don’t understand why? Is this a requirement for fasta files to end with an empty new line?
This the code I am using
my $obj = $db->get_Seq_by_id($id);
my $seq = $obj->seq; # returns 36 or 35 nucleotides depending if last new line exists
my $length = $obj->length; # returns 36 or 35
And the fasta sequence:
gi|37423|emb|X04588.1| Human 2.5 kb mRNA for cytoskeletal tropomyosin TM30(nm)
CCCTTTAAATTTCCCTTTAAATTTCCCTTTAAATTTT
You should check that your fasta file has an even number of lines:
wc -l file.fasta.It is a requirement that for each line in your fasta file, there must be an end of current line character:
$. If you use thevieditor, type:set listto reveal these hidden characters. Alternatively, try:cat -A file.fastato see the line endings.Also, to be a true fasta file, your header line should begin with the
>character.Perhaps it’s not so much the evenness of lines, but rather if the last line in the file contains a newline ending. If this:
does not return a dollar sign (
$), then you may have problems using your fasta file.To replicate the problem, you can remove the last newline character from a ‘good’ (even lined) fasta file with this:
And you can add a newline to the end of your file with this: