So I am trying to do this problem where I have to find the most frequent 6-letter string within some lines in python, so I realize one could do something like this:
>>> from collections import Counter
>>> x = Counter("ACGTGCA")
>>> x
Counter({'A': 2, 'C': 2, 'G': 2, 'T': 1})
Now then, the data I’m using is DNA files and the format of a file goes something like this:
> name of the protein
ACGTGCA ... < more sequences>
ACGTGCA ... < more sequences>
ACGTGCA ... < more sequences>
ACGTGCA ... < more sequences>
> another protein
AGTTTCAGGAC ... <more sequences>
AGTTTCAGGAC ... <more sequences>
AGTTTCAGGAC ... <more sequences>
AGTTTCAGGAC ... <more sequences>
We can start by going one protein at a time, but then how does one modify the block of code above to search for the most frequent 6-character string pattern? Thanks.
I think the simplest way is just do this: