I have some problem with sequence generator. I have a file where each line contain one fragment (8 letters). I load it from file in to list, where each element is one fragment. It is DNA so it should go that way:
1. Takes first 8-letter element
2. Check for element in which first 7 letters is the same as last 7 letters in first.
3. Add 8th letter from second element in to sequence.
It should look like this:
ATTGCCAT
TTGCCATA
TGCAATAC
So sequence: ATTGCCATAC
Unfortunately it only add one element. 🙁 First element is given (we knew it). I do it that way its first in file (first line).
Here is the code:
from os import sys
import random
def frag_get(seqfile):
frags = []
f_in = open(seqfile, "r")
for i in f_in.readlines():
frags.append(i.strip())
f_in.close()
return frags
def frag_list_shuffle(frags):
random.shuffle(frags)
return frags
def seq_build(first, frags):
seq = first
for f in frags:
if seq[-7:] == f[:7]:
seq += f[-1:]
return seq
def errors():
pass
if __name__ == "__main__":
frags = frag_get(sys.argv[1])
first = frags[0]
frags.remove(first)
frags = frag_list_shuffle(frags)
seq = seq_build(first, frags)
check(sys.argv[2], seq)
spectrum(sys.argv[2], sys.argv[3])
I have deleted check and spectrum functions because it’s simple calculations e.g. length comparison, so it is not what cause a problem as I think.
I will be very thankfully for help!
Regards,
Mateusz
Because your fragments are shuffled, your algorithm needs to take that into account; currently, you’re just looping through the fragments once, which is unlikely to include more than a few fragments if they’re not in the right order. For example, say you have 5 fragments, which I’m going to refer to by their order in your sequence. Now the fragments are slightly out of order:
1 – 3 – 2 – 4 – 5
Your algorithm will start with 1, skip 3, then match on 2, adding a base at the end. Then it’ll check against 4 and 5, and then finish, never reaching fragment 3.
You could easily fix this by starting your loop again each time you add a base, however, this will scale very badly for a large number of bases. Instead, I’d recommend loading your fragments into a trie, and then searching the trie for the next fragment each time you add a base, until you’ve added one base for each fragment or you can no longer find a matching fragment.