I found the code that parses the fasta frmated file. I need to count how many A, T, G and so on is in each sequence, for example:
>gi|7290019|gb|AAF45486.1| (AE003417) EG:BACR37P7.1 gene product [Drosophila melanogaster]
MRMRGRRLLPIIL
in this sequence theres:
M - 2
R - 4
G - 1
L - 3
I - 2
P - 1
The code is very simple:
def FASTA(filename):
try:
f = file(filename)
except IOError:
print "The file, %s, does not exist" % filename
return
order = []
sequences = {}
for line in f:
if line.startswith('>'):
name = line[1:].rstrip('\n')
name = name.replace('_', ' ')
order.append(name)
sequences[name] = ''
else:
sequences[name] += line.rstrip('\n').rstrip('*')
print "%d sequences found" % len(order)
return order, sequences
x, y = FASTA("drosoph_b.fasta")
but how can I count those amino acids? I dont want to use BioPython, I would like to know how to do this with, for example count…
An alternative to what katrielalex mentioned in the comments would to use another dictionary, code is below
this outputs: