None of my sed calls seem to be activating on the document. I’ve checked and double checked the regex, it works in all the text editors I have available to me (Geany, Gedit, Notepad ++), anyone have any thoughts on what I’m doing wrong?
#!/bin/sh
clear
antiword q.doc > q.txt
sed -i -e's/\[.*\]//g' q.txt # replace [...] with nothing
sed -i -e's/^[ \t]+[o][ \t]//g' q.txt # replace old word UL with nothing
sed -i -e's/^[ \t]+[•][ \t]//g' q.txt # replace old word UL with nothing
Bonus marks for showing me how to remove extra returns in the file such that there is only ever 1 line between elements that previously had 2 or more.
It looks like you wrote your regex with Perl syntax or extended regular expression syntax in mind, but sed uses basic regular expressions. Depending on the implementation of sed you’re using, the simplest solution would be to tell sed to use extended regular expressions with an
-E(Mac OS X) or-r(GNU sed) flag. You may also need to make the\ts into literal tabs.