Two files with the same structure (first file = unique field/index) File X 1,’a1′,’b1′

Question

0

Asked: June 18, 20262026-06-18T06:36:01+00:00 2026-06-18T06:36:01+00:00

Two files with the same structure (first file = unique field/index) File X 1,’a1′,’b1′

0

Two files with the same structure (first file =  unique field/index)

File X 
1,'a1','b1'
2,'a2','b20'
3,'a3','b3'
4,'a4','b4'

File Y
1,'a1','b1'
2,'a2','b2'
3,'a30','b3'
5,'a5','b5'

Goal: identify differences between these files. There are a lot of fields to compare in each file.

Requested output (maybe there is a better way to present it):

Index   X:a   X:b      Y:a   Y:b    Result

=====   ===   ===      ===   ===    ======
1       a1    b1       a1   b1      No diff
2       a2    b20      a2   b2      Diff in field b (Xb=b20, Yb=b2)
3       a3    b3       a30  b3      Diff in field a (Xa=a3,  Ya=a30
4       a4    b4       null null    missing entries in file Y
5       null  null     a5   b5      missing entries in file X

Ruby code – what I have so far:

x = [[1,'a1','b1'], [2,'a2','b20'], [3,  'a3', 'b3'], [4, 'a4', 'b4']]
y = [[1,'a1','b1'], [2,'a2','b2'],  [3, 'a30', 'b3'], [5, 'a5', 'b5']]

h = Hash.new(0)

x.each {|e|
  h[e[0]] = 1
  }
y.each {|e|
  h[e[0]] = 1
  }

x.each {|e|
  p e[0]
}

I already have all keys (index) from both arrays in hash = h
It seems to be some kind of SQL join using index as a common key.
Can you give me some direction on how to iterate over both arrays to find the differences?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T06:36:02+00:00

The problem of comparing two files is old. At the time of punched cards, forty years ago, we already had to solve it to print bills for items sold every day. One file was the customer file (primary file), the second was the deck of cards punched from delivery forms (secondary file). Each record (card) in this secondary file contained both the customer number and the item number. Both files were sorted on customer number, and the algorithm was called matching. It consists of reading one record from each file, comparing the common key, and selecting one of three possible cases :

primary key < secondary key : skip this customer (normal, there are
more customers in the customer file than in today’s sales)
Read next primary record
primary key = secondary key : print a bill
Read next customer record
Read and print items from secondary file until the customer number changes
primary key > secondary key : typo in the secondary file or new customer,
not yet added in the customer file
Print error message (not a valid customer)
Read next secondary record

The read loop continues as long as there are records to read, that is as long as both files are not at EOF (end of file). The core part of a bigger Matching module I have written in Ruby is :

def matching(p_actionSmaller, p_actionEqual, p_actionGreater)
    read_primary
    read_secondary

    while ! @eof_primary || ! @eof_secondary
        case
        when @primary_key < @secondary_key
            p_actionSmaller.call(self)
            read_primary
        when @primary_key == @secondary_key
            p_actionEqual.call(self)
            read_primary
            read_secondary
        when @primary_key > @secondary_key
            p_actionGreater.call(self)
            read_secondary
        end
    end
end

Here is a simplified version adapted to your array problem :

# input "files" :
x = [               [2,'a2','b20'], [3,  'a3', 'b3'], [4,'a4','b4']                 ]
y = [[1,'a1','b1'], [2,'a2','b2' ], [3, 'a30', 'b3'],                [5, 'a5', 'b5']]
puts '--- input --- :'
print 'x='; p x
print 'y='; p y

xh = Hash.new
yh = Hash.new

# converted to hash for easy extraction of data :
x.each do |a|
    key, *value = a
    xh[key] = value
end

y.each do |a|
    key, *value = a
    yh[key] = value
end

puts '--- as hash --- :'
print 'xh='; p xh
print 'yh='; p yh

# sort keys for matching both "files" on the same key :
@xkeys = xh.keys.sort
@ykeys = yh.keys.sort

print '@xkeys='; p @xkeys
print '@ykeys='; p @ykeys

# simplified algorithm, where EOF is replaced by HIGH_VALUE :
@x_index = -1
@y_index = -1
HIGH_VALUE = 255

def read_primary
    @x_index += 1 # read next record
        # The primary key is extracted from the record.
        # At EOF it is replaced by HIGH_VALUE, usually x'FFFFFF'
    @primary_key = @xkeys[@x_index] || HIGH_VALUE
        # @xkeys[@x_index] returns nil if key does not exist, nil || H returns H
end

def read_secondary
    @y_index += 1
    @secondary_key = @ykeys[@y_index] || HIGH_VALUE
end

puts '--- matching --- :'
read_primary
read_secondary

while @x_index < @xkeys.length || @y_index < @ykeys.length
    case
    when @primary_key < @secondary_key
        puts "case < : #{@primary_key} < #{@secondary_key}"
        puts "x #{xh[@primary_key].inspect} has no equivalent in y"
        read_primary
    when @primary_key == @secondary_key
        puts "case = : #{@primary_key} = #{@secondary_key}"
        puts "compare #{xh[@primary_key].inspect} with #{yh[@primary_key].inspect}"
        read_primary
        read_secondary
    when @primary_key > @secondary_key
        puts "case > : #{@primary_key} > #{@secondary_key}"
        puts "y #{yh[@secondary_key].inspect} has no equivalent in x"
        read_secondary
    end
end

Execution :

$ ruby -w t.rb
--- input --- :
x=[[2, "a2", "b20"], [3, "a3", "b3"], [4, "a4", "b4"]]
y=[[1, "a1", "b1"], [2, "a2", "b2"], [3, "a30", "b3"], [5, "a5", "b5"]]
--- as hash --- :
xh={2=>["a2", "b20"], 3=>["a3", "b3"], 4=>["a4", "b4"]}
yh={5=>["a5", "b5"], 1=>["a1", "b1"], 2=>["a2", "b2"], 3=>["a30", "b3"]}
@xkeys=[2, 3, 4]
@ykeys=[1, 2, 3, 5]
--- matching --- :
case > : 2 > 1
y ["a1", "b1"] has no equivalent in x
case = : 2 = 2
compare ["a2", "b20"] with ["a2", "b2"]
case = : 3 = 3
compare ["a3", "b3"] with ["a30", "b3"]
case < : 4 < 5
x ["a4", "b4"] has no equivalent in y
case > : 255 > 5
y ["a5", "b5"] has no equivalent in x

I leave the presentation of the differences to you.
HTH

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Two files with the same structure (first file = unique field/index) File X 1,’a1′,’b1′

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply