I have a file which has contents on every line following this format (A, B, C, and D represent text):
A B [C] D
E.g.:
cat Cat [noun] This animal likes to eat mice.
- The first separator is the first occurrence of a space (” “) on a line.
- The second separator is the first occurrence of a space followed by a square opening bracket (” [“).
- The final separator is the first occurrence of a square closing bracket followed by a space (“] “).
I want to convert all of the content in this file to a CSV file, where @ is used in place of commas:
A@B@C@D
- The original file contains many foreign characters in UTF-8.
- There are no spaces or brackets within the contents of A and B.
- C sometimes contains spaces, but no brackets inside the two given.
- D contains anything from spaces, square brackets, etc. and the contents should remain unchanged by the conversion.
How can I convert this file to that format?
You need to perform char substitution. I suggest you use sed with regular expression. This is a piece of code corresponding to your example:
For substituting every column in a specific way, the following form is used: