I have two huge files (file1 and file2). Both the files are organized into lines. I need to generate a third file file3 which has the lines that is there is file 1 but not in file 2. The lines are not ordered.
What is the easiest (smartest) way to get it in Windows?
The best strategy might depend on exactly how huge the files are. If the first file can fit into memory, then you can easily build a set of its lines, and removes lines from
file2from that set. This requires the amount of memory roughly proportional to the sizefile1.Note that this solution will also eliminate duplicated lines from
file1.