I found the code that parses the fasta frmated file. I need to count

Question

0

Asked: June 17, 20262026-06-17T16:29:55+00:00 2026-06-17T16:29:55+00:00

I found the code that parses the fasta frmated file. I need to count

0

I found the code that parses the fasta frmated file. I need to count how many A, T, G and so on is in each sequence, for example:

>gi|7290019|gb|AAF45486.1| (AE003417) EG:BACR37P7.1 gene product [Drosophila melanogaster]
MRMRGRRLLPIIL

in this sequence theres:

M - 2
R - 4
G - 1
L - 3
I - 2
P - 1

The code is very simple:

 def FASTA(filename):
  try:
    f = file(filename)
  except IOError:                     
    print "The file, %s, does not exist" % filename
    return

  order = []
  sequences = {}

  for line in f:
    if line.startswith('>'):
      name = line[1:].rstrip('\n')
      name = name.replace('_', ' ')
      order.append(name)
      sequences[name] = ''
    else:
      sequences[name] += line.rstrip('\n').rstrip('*')

  print "%d sequences found" % len(order)
  return order, sequences

x, y = FASTA("drosoph_b.fasta")

but how can I count those amino acids? I dont want to use BioPython, I would like to know how to do this with, for example count…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T16:29:56+00:00

An alternative to what katrielalex mentioned in the comments would to use another dictionary, code is below

def FASTA(filename):
  try:
    f = file(filename)
  except IOError:                     
    print "The file, %s, does not exist" % filename
    return

  order = []
  sequences = {}
  counts = {}

  for line in f:
    if line.startswith('>'):
      name = line[1:].rstrip('\n')
      name = name.replace('_', ' ')
      order.append(name)
      sequences[name] = ''
    else:
      sequences[name] += line.rstrip('\n').rstrip('*')
      for aa in sequences[name]:
        if aa in counts:
            counts[aa] = counts[aa] + 1
        else:
            counts[aa] = 1  


  print "%d sequences found" % len(order)
  print counts
  return order, sequences

x, y = FASTA("drosoph_b.fasta")

this outputs:

1 sequences found
{'G': 1, 'I': 2, 'M': 2, 'L': 3, 'P': 1, 'R': 4}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I found the code that parses the fasta frmated file. I need to count

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply