The following program has been running for about ~22 hours on two files (txt,

Question

0

Asked: June 3, 20262026-06-03T07:13:26+00:00 2026-06-03T07:13:26+00:00

The following program has been running for about ~22 hours on two files (txt,

0

The following program has been running for about ~22 hours on two files (txt, ~10MB ea.). Each file has about ~100K rows. Can someone give me an indication of how inefficient my code is and perhaps a faster method. The input dict are ordered and preserving order is necessary:

import collections

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

Su = {}
with open ('Sucrose_rivacombined.txt') as f:
    for line in f:
        (key, val) = line.split('\t')
        Su[(key)] = val
    Su_OD = collections.OrderedDict(Su)

Su_keys = Su_OD.keys()
Et = {}

with open ('Ethanol_rivacombined.txt') as g:
    for line in g:
        (key, val) = line.split('\t')
        Et[(key)] = val
    Et_OD = collections.OrderedDict(Et)

Et_keys = Et_OD.keys()

merged_keys = Su_keys + Et_keys
merged_keys =  uniq(merged_keys)

d3=collections.OrderedDict()

output_doc = open("compare.txt","w+")

for chr_local in merged_keys:
    line_output = chr_local
    if (Et.has_key(chr_local)):
        line_output = line_output + "\t" + Et[chr_local]
    else:
        line_output = line_output + "\t" + "ND"
    if (Su.has_key(chr_local)):
        line_output = line_output + "\t" + Su[chr_local]
    else:
        line_output = line_output + "\t" + "ND"

    output_doc.write(line_output + "\n")

The input files are as follows: not every key is present in both files

Su:
chr1:3266359    80.64516129
chr1:3409983    100
chr1:3837894    75.70093458
chr1:3967565    100
chr1:3977957    100


Et:
chr1:3266359    95
chr1:3456683    78
chr1:3837894    54.93395855
chr1:3967565    100
chr1:3976722    23

I would like the output to look as follows:

chr1:3266359    80.645    95
chr1:3456683    ND        78

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T07:13:27+00:00

You don’t need your unique function.

pseudo code like:

read file 2 as OrderedDict
process file 1 writing out it’s item (already ordered correctly)
pop, with defalut from file 2 for last part of the output line
after file one is consumed process the Ordered dict from file 2

Also, love list comprehensions…you can read the file with:

OrderedDict(line.strip().split('\t') for line in open('Ethanol_rivacombined.txt'))

Only one ordered dict and ‘Sucrose_rivacombined.txt’ never even makes it into memory. should be super fast

EDIT complete code (not sure about your output line format)

from collections import OrderedDict

Et_OD = OrderedDict(line.strip().split('\t') for line in open('Ethanol_rivacombined.txt'))

with open("compare.txt","w+") as output_doc:
    for line in open('Sucrose_rivacombined.txt'):
        key,val = line.strip().split('\t')
        line_out = '\t'.join((key,val,Et_OD.pop(key,'ND')))
        output_doc.write(line_out+'\n')

    for key,val in Et_OD.items():
        line_out = '\t'.join((key,'ND',val))
        output_doc.write(line_out+'\n')

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The following program has been running for about ~22 hours on two files (txt,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply