I figured out that in order to turn [some name] into [some_name] I need to use the following expression:
s/\(\[[^ ]*\) /\1_/
i.e. create a backreference capture for anything that starts with a literal ‘[‘ that contains any number of non space characters, followed by a space, to be replaced with the non space characters followed by an underscore. What I don’t know yet though is how to alter this expression so it works for ALL underscores within the braces e.g. [a few words] into [a_few_words].
I sense that I’m close, but am just missing a chunk of knowledge that will unlock the key to making this thing work an infinite number of times within the constraints of the first set of []s contained in a line (of SQL Server DDL in this case).
Any suggestions gratefully received….
There are two parts to the trickery needed:
Stop replacing when you reach a close square bracket (but do it repeatedly on the line):
This matches an open square bracket, followed by zero or more characters that are neither a blank nor a close square bracket. The global suffix means that the pattern is applied to all sequences starting with an open square bracket followed eventually by a blank or close square bracket on the line. Note, too, that this regex does not alter ‘
[single-word] and context‘ whereas the original would translate that to ‘[single-word]_and context‘, which is not the object of the exercise.Get sed to repeat the search from where this one started. Unfortunately, there isn’t a truly good way to do that. Sed always resumes searching after the text that was substituted; and this is one occasion when we don’t want that. Sometimes, you can get away with simply repeating the substitute operation. In this case, you have to repeat it every time the substitution succeeds, stopping when there are no more substitutions.
Two of the less well known operations in
sedare the ‘:label‘ and the ‘t‘ commands. They were present in the 7th Edition of Unix (circa 1978), though, so they are not new features. The first simply identifies a position in the script which can be jumped to with ‘b‘ (not wanted here) or ‘t‘:Marvellous: we need:
Except – it doesn’t work all on one line like that (at least, not on MacOS X). This did work admirably, though:
Or, as noted in the comments, you could write three separate ‘-e’ options (which works on MacOS X):
Given the data file:
the output from the sed script shown is:
And, finally, reading the fine print in the question, if you need this done only in the first square-bracketed field on each line, then we need to ensure that are no open square brackets before the one that starts the match. This variant works:
(The ‘g’ qualifier is gone – it probably isn’t needed in the other variants either given the loop; its presence might make the process marginally more efficient, but it would most likely be essentially impossible to detect that. The pattern is now anchored to the start of the line (the caret) and contains zero or more characters that are not open square bracket before the first open square bracket.)
Sample output: