Two files with the same structure (first file = unique field/index)
File X
1,'a1','b1'
2,'a2','b20'
3,'a3','b3'
4,'a4','b4'
File Y
1,'a1','b1'
2,'a2','b2'
3,'a30','b3'
5,'a5','b5'
Goal: identify differences between these files. There are a lot of fields to compare in each file.
Requested output (maybe there is a better way to present it):
Index X:a X:b Y:a Y:b Result
===== === === === === ======
1 a1 b1 a1 b1 No diff
2 a2 b20 a2 b2 Diff in field b (Xb=b20, Yb=b2)
3 a3 b3 a30 b3 Diff in field a (Xa=a3, Ya=a30
4 a4 b4 null null missing entries in file Y
5 null null a5 b5 missing entries in file X
Ruby code – what I have so far:
x = [[1,'a1','b1'], [2,'a2','b20'], [3, 'a3', 'b3'], [4, 'a4', 'b4']]
y = [[1,'a1','b1'], [2,'a2','b2'], [3, 'a30', 'b3'], [5, 'a5', 'b5']]
h = Hash.new(0)
x.each {|e|
h[e[0]] = 1
}
y.each {|e|
h[e[0]] = 1
}
x.each {|e|
p e[0]
}
I already have all keys (index) from both arrays in hash = h
It seems to be some kind of SQL join using index as a common key.
Can you give me some direction on how to iterate over both arrays to find the differences?
The problem of comparing two files is old. At the time of punched cards, forty years ago, we already had to solve it to print bills for items sold every day. One file was the customer file (primary file), the second was the deck of cards punched from delivery forms (secondary file). Each record (card) in this secondary file contained both the customer number and the item number. Both files were sorted on customer number, and the algorithm was called matching. It consists of reading one record from each file, comparing the common key, and selecting one of three possible cases :
more customers in the customer file than in today’s sales)
Read next primary record
Read next customer record
Read and print items from secondary file until the customer number changes
not yet added in the customer file
Print error message (not a valid customer)
Read next secondary record
The read loop continues as long as there are records to read, that is as long as both files are not at EOF (end of file). The core part of a bigger Matching module I have written in Ruby is :
Here is a simplified version adapted to your array problem :
Execution :
I leave the presentation of the differences to you.
HTH