What is the most efficient (fastest) way to simultaneously read in two large files and do some processing?
I have two files; a.txt and b.txt, each containing about a hundred thousand corresponding lines. My goal is to read in the two files and then do some processing on each line pair
def kernel:
a_file=open('a.txt','r')
b_file=open('b.txt', 'r')
a_line = a_file.readline()
b_line = b_file.readline()
while a_line:
process(a_spl,b_spl) #process requiring both corresponding file lines
I looked in to xreadlines and readlines but i’m wondering if i can do better. speed is of paramount importance for this task.
thank you.
The below code does not accumulate data from the input files in memory, unless the
processfunction does that by itself.If the
processfunction is efficient, this code should run quickly enough for most purposes. Theforloop will terminate when the end of one of the files is reached. If either file contains an extraordinarily long line (i.e. XML, JSON), or if the files are not text, this code may not work well.