I’m stuck in something that looks like it should be simple to SED.
I have some (kind of) CSV files that I get from another application, so I cannot control its output. Some preprocessing is already done with SED, but I am stuck on the last one. So I wish to do it with SED, if possible, to avoid using a third application.
The problem is that the heading line of the file (first line) is repeated along the file, but unfortunately with the following characteristics:
- The heading of each CSV file is unknown previously. Each file have its own heading, that might be different from each other;
- Not always repetition occurs on every N lines (being N a fixed known number)
- Other data (non heading) lines might be repeated, and should be maintained
So, suppose I have the following 2 files:
Cash.csv
Name; Amount
John; 3.55
Erick; 4.76
John; 8.99
Name; Amount
Erick; 4.76
Mark; 1.00
Name; Amount
John; 3.55
Check.csv
Name; Account; Amount
Erick; 345344; 123.00
Mark; 88849; 323.50
Name; Account; Amount
John; 474473; 99.00
Mark; 88849; 323.50
Mark; 88849; 323.50
John; 474473; 99.00
What I wish is a single SED script that applied to each file turn them into:
Cash.processed.csv
Name; Amount
John; 3.55
Erick; 4.76
John; 8.99
Erick; 4.76
Mark; 1.00
John; 3.55
Check.processed.csv
Name; Account; Amount
Erick; 345344; 123.00
Mark; 88849; 323.50
John; 474473; 99.00
Mark; 88849; 323.50
Mark; 88849; 323.50
John; 474473; 99.00
I was wondering if its possible to use SED “hold buffer” as a pattern on the delete command:
1h #Hold the first line (headings)
/\h/d #Use hold buffer as a pattern to delete
Supposing “\h” would return the hold buffer to the delete command.
Thanks for any replies;
PS: Please don’t answer with the following over-specific command:
1p;/Name; Amount\|Name; Account; Amout/d
I think you’ll need to capture the first line from one
sedcommand and then use that in the main operational command:Because the
sed 1qquits after reading the first line, it is quick regardless of how big the data file is. If there’s a chance that the first line might contain a slash (heading"Name/Number", perhaps) or other regex metacharacters, then think of using something like this, which replaces all slashes with.:I did some futzing with the Mac OS X (10.8.1) version of
sed, which is fussier than GNUsed. In the second (main)sedcommand, the match had to be in{...}, the dollar had to be separate (or the shell gets antsy about invalid parameter substitution), and the semi-colon was needed. Some of those restrictions probably aren’t needed with GNUsed, but the code shown is likely to work anywhere.