I have a dictionary file formatted like this:
A B [C] D
Where a is a word (with no spaces), B is another word (with no spaces inside it), C is the pronunciation (there are spaces here), and D is the definition expressed in words (there are spaces, and a variety of symbols).
I wish to separate it into 4 parts, like this:
A@@@@B@@@@C@@@@D
In this way, the first space is converted to @@@@, the first [ is converted to @@@@, and the first ] is converted to @@@@. This will allow easy import into a spreadsheet as a CSV (@@@@‘s serve as the commas).
Can this be achieved with awk or another tool in BASH?
Update:
Here are some samples:
一千零一夜 一千零一夜 [Yi1 qian1 ling2 yi1 ye4] /The Book of One Thousand and One Nights/
灰姑娘 灰姑娘 [Hui1 gu1 niang5] /Cinderella/a sudden rags-to-riches celebrity/
雪白 雪白 [xue3 bai2] /snow white/
Would be converted to:
一千零一夜@@@@一千零一夜 @@@@Yi1 qian1 ling2 yi1 ye4@@@@ /The Book of One Thousand and One Nights/
灰姑娘@@@@灰姑娘 @@@@Hui1 gu1 niang5@@@@ /Cinderella/a sudden rags-to-riches celebrity/
雪白@@@@雪白 @@@@xue3 bai2@@@@ /snow white/
Consider that anything might appear after the third set of @@@@‘s, including more spaces, [, etc., however, before the third @@@@, everything is consistent in format.
I think sed will be easier:
By default (i.e. if you don’t specify the
gmodifier at the end) substitutions only work once per line.Or, if you want to do it in-place:
(but not all versions of sed support that, and you’ll lose your input file)