What I want to do is:
find some_files -name '*.html' -exec sed -i "s/`cat old`/`cat new`/g" {} \;
with old and new containing newline characters and slashes and other special characters, which prevent sed from parsing correctly.
I have read about how to escape newline characters with sed, and the command tr, the command printf ‘%q’, but I can’t make these work properly, maybe because I don’t fully understand their function. Additionally, I don’t know which special characters I still have to escape for sed to work.
I’m not sure what you want to do exactly, but if the old file contains newlines, you’re probably going to run into trouble. That is because sed works by applying the commands on each line, so trying to match a line with a pattern that represents multiple lines will not work unless you load more lines explicitly.
My suggestion would be to load the whole file into sed’s “buffer” before applying the substitute command. Then, you’d have to make sure that old and new are escaped correctly. Also, what could become more confusing is that escaping for the old file (the pattern) must be different than for the new file (the replacement).
Let’s start by escaping the new file into a “new.tmp” file. For clarity, we’ll create a sed script called “escape_new.sed”:
Then run it:
sed -f escape_new.sed new > new.tmpThere are three commands we use to escape:
Now let’s escape the old file. As above, we’ll create an “escape_old.sed” script. Before we do it though, we need to load the whole file into the pattern space (sed’s internal buffer) so we can replace newline characters. We can do that with the following commands:
The first command creates a label called “a”. The second command (“{“) actually starts a group of commands. The magic here is the “$!” address prefix. That prefix tells it to run the commands only if the last input line that was read wasn’t the last line of the input (“$” means last line of the input and “!” means not). The first command in the group appends the next line from the input into the pattern space. If this “N” command is executed in the last line, it terminates the script, so we must be careful to not execute it on the last line. The second command in the group is a branch command, “b”, which will “jump” back to the “a” label. The magic is the “$!” address prefix we have before the command. The closing bracket closes the group. This group, with its respective address prefix, allows us to loop through all of the lines, concatanting them together, and stop after the last line, allowing any further commands to be executed. We then have the final script:
As above, we need to escape the special characters. In this case an actual newline is now escaped as a backslash followed by the letter n. In the last command, there are more characters that need to be prefixed by a backslash. Notice that to match a closing square-bracket, it needs to be the first character inside the square-brackets, to prevent sed from interpreting it as the closing character for our list of characters to match. Therefore, the characters that are listed in order between the square brackets are
][/^$..And again, we execute it with:
sed -f escape_new.sed old > old.tmpNow we can use these escaped files in the sed command, but again we must load all of the lines into pattern space. Using the same commands as before, but placing them into a single line we have the compact form:
:a;$!{N;ba}: which we can now use in the final expression (without the closing slash character that is now on the new.tmp file):And hopefully it will work =)
Notice that we have escaped the
$symbol with a backslash, otherwise the shell will think that we are trying to access the$!variable (result of the last asynchronous command executed).