I have this test file.
[root@localhost ~]# cat f.txt "a aa" MM "bbb b" MM MM MM"b b " [root@localhost ~]#
I want to replace all space characters in the quotes, note, just in the quotes. All characters out of the quotes should not be touched. That is to say, what I want is something similar to:
"a_aa" MM "bbb__b" MM MM MM"b_b_"
Can this be implemented using sed?
Thanks,
This is an entirely non-trivial question.
This works replacing the first space inside quotes with underscore:
For this example, where there are no more than two spaces inside any of the quotes, it is tempting to simply repeat the command, but it gives an incorrect result:
If your version of
sedsupports ‘extended regular expressions’, then this works for the sample data:You have to repeat that ghastly regex for every space within double quotes – hence three times for the first line of data.
The regex can be explained as:
Because of the start anchor, this has to be repeated once per blank…but
sedhas a looping construct, so we can do it with:The
:redodefines a label; thes///command is as before; thet redocommand jumps to the label if there was any substitution done since the last read of a line or jump to a label.Given the discussion in the comments, there are a couple of points worth mentioning:
The
-Eoption applies tosedon MacOS X (tested 10.7.2). The corresponding option for the GNU version ofsedis-r(or--regex-extended). The-Eoption is consistent withgrep -E(which also uses extended regular expressions). The ‘classic Unix systems’ do not support EREs withsed(Solaris 10, AIX 6, HP-UX 11).You can replace the
?I used (which is the only character that forces the use of an ERE instead of a BRE) with*, and then deal with the parentheses (which require backslashes in front of them in a BRE to make them into capturing parentheses), leaving the script:This produces the same output on the same input – I tried some slightly more complex patterns in the input:
This gives the output:
Even with BRE notation,
sedsupported the\{0,1\}notation to specify 0 or 1 occurrences of the previous RE term, so the?version could be translated to a BRE using:This produces the same output as the other alternatives.