I have a FASTA file with an alignment of multiple gene samples. I am trying to develop a program that can count the number of mutations for each sample. What’s the best way to do this? Store each gene sample in a dictionary and compare them somehow?
Share
If they are in an alignment format already, the identities and mismatches are already calculated. So you have something like this:
Aln1: ACTGGTTGTCCAACCGTAATCGAAG
Aln2: —GGTTGTCCAATTC—TCGAAG
Capture each one into a string, and simply enumerate over them.
Something simple like this works:
It depends on your personal criteria though, if you want to include gaps as mutations, etc.