I have a CSV file which uses a highly customized format. Here, each number represents a the data in each of the 4 columns:
1 2 [3] 4
I need to restrict sed to only search and modify data appearing in the fourth column. Essentially, it must ignore all data on the line appearing before the first occurrence of a closing square bracket and space, ] and only modify data appearing after. E.g., file1.txt might contain this:
penguin bird [lives in Antarctica] The penguin lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat penguins.
The replacement might be sed 's/penguin/animal/g' file1.txt. After running the script, the output would look like this:
penguin bird [lives in Antarctica] The animal lives in cold places.
wolf dog [lives in Antarctica with penguins] The wolf likes to eat animal.
In this case, all appearances of penguin were ignored prior to the first ] and were only changed on lines appearing after.
- Additional closing brackets might appear later in the line, but only the first should be regarded as the division.
How can I have sed ignore the first three columns of this custom CSV format while it finds and replaces text?
I have GNU sed version 4.2.1.
Normally I’d do it the way shelter described (if I was just typing in a quick
sedcommand line) but it has the disadvantage that once you start matching part of the input to retain it (with\1etc) you have to match and replace everything and can no longer use simple replacements likes/penguin/animal/. If you are willing to add some boilerplate around the replacement you can stash away the beginning of the line in the hold buffer and then get it back:The
hsaves the original line in the hold space. Then we remove the prefix and do any substitution (picking your example here) or series of substitutions on the end of the line. Thenxswaps the end and the saved copy. We remove the original end from the saved copy and useGto put them back together. TheGadds a newline we don’t want, so we remove that.