I have a large file (5Gb) called my_file . I have a list called

Question

0

Asked: May 25, 20262026-05-25T11:11:30+00:00 2026-05-25T11:11:30+00:00

I have a large file (5Gb) called my_file . I have a list called

0

I have a large file (5Gb) called my_file. I have a list called my_list. What is the most efficient way to read each line in the file and, if an item from my_list matches an item from a line in my_file, create a new list called matches that contains items from the lines in my_file AND items from my_list where a match occurred. Here is what I am trying to do:

def calc(my_file, my_list)
    matches = []
    my_file.seek(0,0)
    for i in my_file:
        i = list(i.rstrip('\n').split('\t'))
        for v in my_list:
            if v[1] == i[2]:
                item = v[0], i[1], i[3]
                matches.append(item)
    return matches

here are some lines in my_file:

lion    4    blue    ch3
sheep   1    red     pq2
frog    9    green   xd7
donkey  2    aqua    zr8

here are some items in my_list

intel    yellow
amd      green
msi      aqua

The desired output, a list of lists, in the above example would be:

[['amd', 9, 'xd7'], ['msi', 2, 'zr8']]

My code is currently work, albeit really slow. Would using a generator or serialization help? Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T11:11:30+00:00

You could build a dictonary for looking up v. I added further little optimizations:

def calc(my_file, my_list)

    vd = dict( (v[1],v[0]) for v in my_list)

    my_file.seek(0,0)
    for line in my_file:
        f0, f1, f2, f3 = line[:-1].split('\t')
        v0 = vd.get(f2)
        if v0 is not None:
           yield (v0, f1, f3)

This should be much faster for a large my_list.

Using get is faster than checking if i[2] is in vd + accessing vd[i[2]]

For getting more speedup beyond these optimizations I recommend http://www.cython.org

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large file (5Gb) called my_file . I have a list called

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply