I have a CSV file formatted like this:
id @ word @ information @ other information
Sometimes, the first column has repeat occurrences:
001 @ cat @ makes a great pet @ mice
002 @ rat @ makes a great friend @ cheese
003 @ dog @ can guard the house @ chicken
004 @ cat @ can jump very high @ fish
You can see, the first and last lines have duplicate data in column 2. I want to delete these duplicates (if column 2 is exactly the same) and merge the information contained in column three as well as the information contained in column four. The result is like this:
001 @ cat @ ① makes a great pet ② can jump very high @ ① mice ② fish
002 @ rat @ makes a great friend @ cheese
003 @ dog @ can guard the house @ chicken
- I am using these symbols to number the data: “①”, “②”, “③”, etc., but “(1)”, “(2)”, “(3)”, etc. will be okay too.
How can I merge the data in the cells in so that all of the data from the third column is assembled together into one cell and the data in the fourth column is assembled together into one cell?
I worked in ruby (doing this in bash would be kinda painful).
First I wrote a spec to describe the problem:
Here’s a solution:
I opted for the simpler numbering scheme, because what happens if there are more than 50 values to merge? http://en.wikipedia.org/wiki/Enclosed_alphanumerics
I took the liberty of increasing the left padding when there are lots of records.