I am trying to parse some text files with the command line. part of this involves reattaching broken words in some badly-formatted emails. An example:
9,650 330,765.0 16.38% NYSE (000) 1,707,915 272,099.0 18.95% Commodit=
ies Close Change % Change Crude Oil (Feb) 19.62 0.32 1.66% Heating Oil (Ja=
I want to grab ‘Commodities.’ I’m using this workaround to sed to get the job done.
I’m using Mac OS X 10.7 and GNU sed version 4.2.1. If at the command line I enter
sed ':a;N;$!ba;s/=\r\n//g' ./filename
sed works correctly. However if I run this bash script:
#!/bin/bash
sed ':a;N;$!ba;s/=\r\n//g' filename
sed doesn’t work. However, the same script works under Ubuntu’s command line:
9,650 330,765.0 16.38% NYSE (000) 1,707,915 272,099.0 18.95% Commodities Close Change % Change Crude Oil (Feb) 19.62 0.32 1.66% Heating Oil (Jan)
On my Mac, the simpler script
#!/bin/bash
sed 's/=//g' filename
successfully removes all the equal signs. I’m trying different combinations of characters to backslash out but without much success. Any hints to what the Mac terminal isn’t liking?
It’s most likely a PATH setting.
/bin/bashuses the default$PATH; not sure why, but perhaps that depends on your normal working shell (is that bash), or in which dot-files your PATH settings are.OS X comes with its own (BSD) sed, which is not the same as the GNU one, and thus doesn’t work.
Running the
sedcommand in the script will pick up the BSD sed, not your self-installed GNU sed. Use the full path to sed in the script, or set $PATH in your script before. Obviously, you don’t have the problem on Ubuntu, since the defaultsedis GNU.