I have a CSV file formatted like this: id @ word @ information @

Question

0

Asked: May 28, 20262026-05-28T01:56:58+00:00 2026-05-28T01:56:58+00:00

I have a CSV file formatted like this: id @ word @ information @

0

I have a CSV file formatted like this:

id @ word @ information @ other information

Sometimes, the first column has repeat occurrences:

001 @ cat @ makes a great pet @ mice
002 @ rat @ makes a great friend @ cheese
003 @ dog @ can guard the house @ chicken
004 @ cat @ can jump very high @ fish

You can see, the first and last lines have duplicate data in column 2. I want to delete these duplicates (if column 2 is exactly the same) and merge the information contained in column three as well as the information contained in column four. The result is like this:

001 @ cat @ ① makes a great pet ② can jump very high @ ① mice ② fish
002 @ rat @ makes a great friend @ cheese
003 @ dog @ can guard the house @ chicken

I am using these symbols to number the data: “①”, “②”, “③”, etc., but “(1)”, “(2)”, “(3)”, etc. will be okay too.

How can I merge the data in the cells in so that all of the data from the third column is assembled together into one cell and the data in the fourth column is assembled together into one cell?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T01:56:58+00:00

I worked in ruby (doing this in bash would be kinda painful).

First I wrote a spec to describe the problem:

require 'rubygems'
require 'rspec'
require './chew'

describe 'indentation' do
  it "should calculate appropriate padding (minimum 3)" do
    indentation(1).should == 3
    indentation(99).should == 3
    indentation(999).should == 3
    indentation(1000).should == 4
    indentation(1500).should == 4
    indentation(10000).should == 5
  end
end

describe 'chew' do
  it "should merge duplicate entries in a csv file" do

    input = <<-TEXT
001 @ cat @ makes a great pet @ mice
002 @ rat @ makes a great friend @ cheese
003 @ dog @ can guard the house @ chicken
004 @ cat @ can jump very high @ fish
    TEXT

    output = <<-TEXT
001 @ cat @ (1) makes a great pet (2) can jump very high @ (1) mice (2) fish
002 @ rat @ makes a great friend @ cheese
003 @ dog @ can guard the house @ chicken
    TEXT

    chew(input).should == output

  end
end

Here’s a solution:

#! /bin/bash/env ruby

def merged_values(values)
  return values[0] if values.size == 1
  merged = []
  values.each_with_index do |value, i|
    merged << "(#{i+1}) #{value}"
  end
  merged.join(" ")
end

def indentation(count)
  [3, Math.log10(count) + 1].max.to_i
end

def chew(input)

  records = Hash.new {|hash, key| hash[key] = [[],[]]}
  input.split(/\n/).each do |row|
    row_number, key, first_value, second_value = row.split(/\s*@\s*/)
    records[key][0] << first_value
    records[key][1] << second_value
    records
  end

  row_number_format = "%0.#{indentation(records.size)}d"

  result = ""
  records.each_with_index do |record, i|
    key, values = record
    result << [
      row_number_format % (i+1),
      key,
      merged_values(values[0]),
      merged_values(values[1])
    ].join(" @ ") << "\n"
  end
  result

end

if $0 == __FILE__
  abort "usage: ruby chew.rb input_file" unless ARGV.size == 1
  puts chew(File.read(ARGV[0]))
end

I opted for the simpler numbering scheme, because what happens if there are more than 50 values to merge? http://en.wikipedia.org/wiki/Enclosed_alphanumerics

I took the liberty of increasing the left padding when there are lots of records.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a CSV file formatted like this: id @ word @ information @

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply