I have codes that allow me to perform a fairly complicated task (for me

Question

0

Asked: June 9, 20262026-06-09T01:21:55+00:00 2026-06-09T01:21:55+00:00

I have codes that allow me to perform a fairly complicated task (for me

0

I have codes that allow me to perform a fairly complicated task (for me at least):

import csv
import os.path
#open files + readlines
with open("C:/Users/Ivan Wong/Desktop/Placement/Lists of targets/Mouse/UCSC to Ensembl.csv", "r") as f:
    reader = csv.reader(f, delimiter = ',')
    #find files with the name in 1st row
    for row in reader:
        graph_filename = os.path.join("C:/Python27/Scripts/My scripts/Top targets",row[0]+"_nt_counts.txt.png")
        if os.path.exists(graph_filename):
            y = row[0]+'_nt_counts.txt'  
            r = open('C:/Users/Ivan Wong/Desktop/Placement/fp_mesc_nochx/'+y, 'r')
            k = r.readlines()
            r.close
            del k[:1]
            k = map(lambda s: s.strip(), k)
            interger = map(int, k)   
            import itertools
            #adding the numbers for every 3 rows
            def grouper(n, iterable, fillvalue=None):
                "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
                args = [iter(iterable)] * n
                return itertools.izip_longest(*args, fillvalue=fillvalue)
            result = map(sum, grouper(3, interger, 0))       
            e = row[0]
            print e
            cDNA = open('C:/Users/Ivan Wong/Desktop/Placement/Downloaded seq/Mouse/MOUSE_mRNAs.txt', 'r')
            seq = cDNA.readlines()
            # get all lines that have a gene name
            lineNum = 0;
            lineGenes = []
            for line in seq:
                lineNum = lineNum +1
                if '>' in line:
                    lineGenes.append(str(lineNum))
                if '>'+e in line:
                    lineBegin = lineNum

            cDNA.close

            # which gene is this
            index1 = lineGenes.index(str(lineBegin))
            lineEnd = lineGenes[index1+1]           
# linebegin and lineEnd now give you, where to look for your sequence, all that 
# you have to do is to read the lines between lineBegin and lineEnd in the file
# and make it into a single string.            
            lineEnd = lineGenes[index1+1]
            Lastline = int(lineEnd) -1

# in your code you have already made a list with all the lines (q), first delete
# \n and other symbols, then combine all lines into a big string of nucleotides (like this)     
            qq = seq[lineBegin:Lastline]
            qq = map(lambda s: s.strip(), qq)
            string  = ''
            for i in range(len(qq)):
                string = string + qq[i]
# now you want to get a list of triplets, again you can use the for loop:
# first get the length of the string
            lenString = len(string);
# this is your list codons
            listCodon = []
            for i in range(0,lenString/3): 
                listCodon.append(string[0+i*3:3+i*3])
            proper_result = '\n'.join('%s, %s' % (nr, codon) for nr, codon in zip(result, listCodon))
            with open(e+'.csv','wb') as outfile:
                outfile.writelines(proper_result)

These codes read a file from a .csv, identifying from a folder with files having the same name, if they exist then it goes on to process some data and write them into a .csv
with them, my outfiles now looks like this outfile

It looks completely fine but with one problem, I know from my data (I checked it in different ways) that the 2nd columns should be longer than what I have got. I think it is because the codes are writing the files when BOTH result(the number) and listCodon(the letters) are existed, therefore I am missing something. How can I fix it?

I tried to print listCodon just before the file is written, and found out the all triplets are still there so I am guessing the problem is within here:

proper_result = '\n'.join('%s, %s' % (nr, codon) for nr, codon in zip(result, listCodon))

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T01:21:57+00:00

Editorial Team

2026-06-09T01:21:57+00:00Added an answer on June 9, 2026 at 1:21 am

zip will stop as soon as any of its iterables stop (because otherwise it wouldn’t know what to fill in the blanks with!):

The returned list is truncated in length to the length of the shortest argument sequence.

If you want to pad the shorter iterables to the length of the longest, use izip_longest (which takes an optional parameter to use as the fill value).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have codes that allow me to perform a fairly complicated task (for me

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply