I am very new to sed and so even with looking at examples I am totally at a loss as how to go about writing the correct code for my need (this one is close but it seems not for multi-line replacement.
Here is my input.txt
This is a test of splitting…
|firstword|secondwordthirdword
fourthwordfifthwordsixthwordThis is a test of splitting…
firstword|secondword|thirdword
fourthwordfifthwordsixthwordThis is a test of splitting…
firstwordsecondword|thirdword|
fourthwordfifthwordsixthwordThis is a test of splitting…
firstwordsecondwordthirdword
|fourthword|fifthwordsixthwordThis is a test of splitting…
firstwordsecondwordthirdword
fourthword|fifthword|sixthwordThis is a test of splitting…
firstwordsecondwordthirdword
fourthwordfifthword|sixthword|
What I need to do is remove all text outside of the two “|” and keep the text inside of the two “|”
And then insert a Unicode zero-width-space between each of the words (U+200B)
Resulting in:
firstwordU+200BsecondwordU+200BthirdwordU+200BfourthwordU+200BfifthwordU+200Bsixthword
I tried
sed '\|/d;/|/,$d' input.txt
UPDATE: Which doesn’t do much
And
sed -e 's/.*|\([^]]*\)|.*/\1/g' input.txt
Which comes close, but doesn’t remove anything from lines that do not contain a “|” (I need to remove everything not contained inside two “|” And I don’t know how to go about adding the zero-width-space between words. But like I said, I don’t really know what I am doing.
Any help would be much appreciated.
-Nathan
If you are happy with the results of
other than its failure to remove lines that do not contain the delimiters, then just do:
to only print lines in which the replace happens. Or, you can explicitly delete the unwanted lines: