This is a follow-up to a previous question I had asked: Processing a sub-list of variable size within a larger list.
I managed to use itertools to get groups of DNA fragments out, but now I’m faced with a different problem.
I need to design primers based on these groups of DNA fragments. Primers are designed by including overlaps from different DNA fragments. Let’s say I have three DNA fragments in a list, fragments A, B, and C. I need to extract:
- the last 20 nucleotides (n.t.) of C to concatenate (in order) with the first 40 n.t. of A,
- the reverse complement (RC) of the first 20 n.t. of B to concatenate in order with the RC of the last n.t. of A,
- the last 20 n.t. of A to concatenate with the first 40 n.t. of B,
- the RC of the first 20 n.t. of C to concatenate with the RC of the last 40 n.t. of B,
- the last 20 n.t. of C to concatenate with the first 40 n.t. of A,
- the RC of the first 20 n.t. of A to concatenate with the RC of the last 40 n.t. of C.
I can’t seem to solve this problem, and I’m not sure where’s the best place for me to start on this…
Code that I’ve already written so far outputs just “group 1” (on purpose, so I can minimize the amount of visual output I’m dealing with). Here it is:
#import BioPython Tools
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
#import csv tools
import csv
import sys
import os
import itertools
with open('constructs-to-make.csv', 'rU') as constructs:
construct_list = csv.DictReader(constructs)
def get_construct_number(row):
return row["Construct"]
def get_strategy(row):
return row["Strategy"]
## construct_list.next()
## construct_number = 1
primer_list = []
## temp_list = []
## counter = 2
groups = []
## for row in construct_list:
## print(row)
##
for key, items in itertools.groupby(construct_list, key=get_construct_number):
for subitems in items:
#here, I am trying to get the annealing portion of the Gibson sequence out
if subitems['Strategy'] == 'Gibson' and subitems['Construct'] == '1':
print(subitems['Construct'])
fw_anneal = Seq(subitems['Sequence'][0:40], IUPAC.unambiguous_dna)
print(fw_anneal)
re_anneal = Seq(subitems['Sequence'][-40:], IUPAC.unambiguous_dna).reverse_complement()
print(re_anneal)
fw_overhang = Seq(subitems['Sequence'][0:20], IUPAC.unambiguous_dna).reverse_complement()
print(fw_overhang)
re_overhang = Seq(subitems['Sequence'][-20:], IUPAC.unambiguous_dna)
print(re_overhang)
Any help would be greatly appreciated!
I ended up using a bunch of conditionals to solve this problem.
The code is inelegant, and involves a lot of repetition, but for a quick-and-dirty script that I’ll use over and over, I think it suffices.
Thanks everybody for the help!