I need to read in two large files (over 125 MB). Each file contains records that have similar data. I need to find the records that are in both of them and then if the fields of the records dont match I need to overwrite the records in file two with the fields that are contained in the records from file one.
For example the first file has the following fields:
ID, ACCT, Bal, Int, Rate
The second file has the following fields:
TYPE, ID, ACCT, Bal, Int, Rate.
So if a record in file 1 has the same ACCT number as a record in file 2 then the Bal, Int, and Rate in file 2 need to be overwritten with the value of Bal, Int, and Rate from file 1.
Some of the records won’t be in each file. The output file I need to create is all the records from file two and if the record is not also in file one then it will write to the file as is, but then the records that need to be changed will then be included.
I have tried many different options but most are not efficient enough to deal with the large files. What is the proper direction to take with this problem? Thanks in advance for any help.
Load all records from file 1 into a hash table with ACCT as key.
Loop over all records in file 2 and update if needed.
Complexity: O(n)
HTH