I have codes that allow me to perform a fairly complicated task (for me at least):
import csv
import os.path
#open files + readlines
with open("C:/Users/Ivan Wong/Desktop/Placement/Lists of targets/Mouse/UCSC to Ensembl.csv", "r") as f:
reader = csv.reader(f, delimiter = ',')
#find files with the name in 1st row
for row in reader:
graph_filename = os.path.join("C:/Python27/Scripts/My scripts/Top targets",row[0]+"_nt_counts.txt.png")
if os.path.exists(graph_filename):
y = row[0]+'_nt_counts.txt'
r = open('C:/Users/Ivan Wong/Desktop/Placement/fp_mesc_nochx/'+y, 'r')
k = r.readlines()
r.close
del k[:1]
k = map(lambda s: s.strip(), k)
interger = map(int, k)
import itertools
#adding the numbers for every 3 rows
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.izip_longest(*args, fillvalue=fillvalue)
result = map(sum, grouper(3, interger, 0))
e = row[0]
print e
cDNA = open('C:/Users/Ivan Wong/Desktop/Placement/Downloaded seq/Mouse/MOUSE_mRNAs.txt', 'r')
seq = cDNA.readlines()
# get all lines that have a gene name
lineNum = 0;
lineGenes = []
for line in seq:
lineNum = lineNum +1
if '>' in line:
lineGenes.append(str(lineNum))
if '>'+e in line:
lineBegin = lineNum
cDNA.close
# which gene is this
index1 = lineGenes.index(str(lineBegin))
lineEnd = lineGenes[index1+1]
# linebegin and lineEnd now give you, where to look for your sequence, all that
# you have to do is to read the lines between lineBegin and lineEnd in the file
# and make it into a single string.
lineEnd = lineGenes[index1+1]
Lastline = int(lineEnd) -1
# in your code you have already made a list with all the lines (q), first delete
# \n and other symbols, then combine all lines into a big string of nucleotides (like this)
qq = seq[lineBegin:Lastline]
qq = map(lambda s: s.strip(), qq)
string = ''
for i in range(len(qq)):
string = string + qq[i]
# now you want to get a list of triplets, again you can use the for loop:
# first get the length of the string
lenString = len(string);
# this is your list codons
listCodon = []
for i in range(0,lenString/3):
listCodon.append(string[0+i*3:3+i*3])
proper_result = '\n'.join('%s, %s' % (nr, codon) for nr, codon in zip(result, listCodon))
with open(e+'.csv','wb') as outfile:
outfile.writelines(proper_result)
These codes read a file from a .csv, identifying from a folder with files having the same name, if they exist then it goes on to process some data and write them into a .csv
with them, my outfiles now looks like this
It looks completely fine but with one problem, I know from my data (I checked it in different ways) that the 2nd columns should be longer than what I have got. I think it is because the codes are writing the files when BOTH result(the number) and listCodon(the letters) are existed, therefore I am missing something. How can I fix it?
I tried to print listCodon just before the file is written, and found out the all triplets are still there so I am guessing the problem is within here:
proper_result = '\n'.join('%s, %s' % (nr, codon) for nr, codon in zip(result, listCodon))
zipwill stop as soon as any of its iterables stop (because otherwise it wouldn’t know what to fill in the blanks with!):If you want to pad the shorter iterables to the length of the longest, use
izip_longest(which takes an optional parameter to use as the fill value).