I have a problem when I try to compare two large files. What I am trying to do is take a line from one file, search all the lines of another file for a match and if there isn’t one, write that line to another file. I was able to recreate the problem with the simple example below:
file1.txt (contents)
apple
banana
pear
peach
lime
file_old.txt (contents)
lime
apple
pear
peach
Since I am looking for lines in file1 that are not in file_old, I would expect that ‘banana’ would be the only value to show in the output file. But in the output file, “*fill_diff*”, I am showing:
apple
banana
banana
What is wrong with my code to try and produce the differences in a file?
def main():
file_old = open(r'C:\Users\test\Desktop\file_old.txt', 'r+')
file_new = open(r'C:\Users\test\Desktop\file1.txt', 'r+')
file_diff = open(r'C:\Users\test\Desktop\file_diff.txt', 'w')
for each_line in file_new:
for every_line in file_old:
if each_line == every_line:
break
file_diff.write(each_line)
file_old.close()
file_new.close()
file_diff.close()
main()
Thanks!
srgerg’s answer will work.
However, reading through files multiple times will have a very large runtime complexity. Therefore, if the files (though large) are small enough to fit into memory, then you might consider putting all the lines in
file_oldinto a data structure for comparison:Hope this helps