I am trying to parse a large fasta file and I am encountering out

Question

0

Asked: May 26, 20262026-05-26T04:05:51+00:00 2026-05-26T04:05:51+00:00

I am trying to parse a large fasta file and I am encountering out

0

I am trying to parse a large fasta file and I am encountering out of memory errors. Some suggestions to improve the data handling would be appreciated. Currently the program correctly prints out the names however partially through the file I get a MemoryError

Here is the generator

def readFastaEntry( fp ):
    name = ""
    seq = ""
    for line in fp:
        if line.startswith( ">" ):
            tmp = []
            tmp.append( name )
            tmp.append( seq )
            name = line
            seq = ""
            yield tmp
        else:
            seq = seq.join( line )

and here is the caller stub more will be added after this part works

fp = open( sys.argv[1], 'r' )

for seq in readFastaEntry( fp ) :
    print seq[0]

For those not fimilar with the fasta format here is an example

>1 (PB2)
AATATATTCAATATGGAGAGAATAAAAGAACTAAGAGATCTAATGTCACAGTCTCGCACTCGCGAGATAC
TCACCAAAACCACTGTGGACCACATGGCCATAATCAAAAAGTACACATCAGGAAGGCAAGAGAAGAACCC
TGCACTCAGGATGAAGTGGATGATG
>2 (PB1)
AACCATTTGAATGGATGTCAATCCGACTTTACTTTTCTTGAAAGTTCCAGCGCAAAATGCCATAAGCACC
ACATTTCCCTATACTGGAGACCCTCC

each entry starts with a “>” stating the name etc then the next N lines are data. There is no defined ending of the data other than the next line having a “>” at the beginning.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T04:05:52+00:00

Have you considered using BioPython. They have a sequence reader that can read fasta files. And if you are interested in coding one yourself, you can take a look at BioPython’s code.

Edit: Code added

def read_fasta(fp):
    name, seq = None, []
    for line in fp:
        line = line.rstrip()
        if line.startswith(">"):
            if name: yield (name, ''.join(seq))
            name, seq = line, []
        else:
            seq.append(line)
    if name: yield (name, ''.join(seq))

with open('f.fasta') as fp:
    for name, seq in read_fasta(fp):
        print(name, seq)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse a large fasta file and I am encountering out

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply