My problem is similar problem to shell script: search and replace over multiple lines with a small exception.
In the question linked the user wants to do this:
source:
[stuff before]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
[stuff here, possibly multiple lines.
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]
target:
[stuff before]
[new content]
[stuff after]
My problem is similar, I want to do this:
source:
[stuff before]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
[this]
<!--WIERD_SPECIAL_COMMENT_END-->
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
[not this]
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]
target:
[stuff before]
[new content]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
[not this]
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]
In a proper multiline regex this is easy to do:
/<!--WIERD_SPECIAL_COMMENT_BEGIN-->.*[this].*<!--WIERD_SPECIAL_COMMENT_END-->/m
but the answer suggested in the linked question uses regex as ranges which doesn’t allow checking lines between the two outlying bounds.
Is there any way to add all the lines in a range to the pattern buffer so I can regex over all the lines at once? eg:
sed '
#range between comment beginning and comment end
/<!--WIERD_SPECIAL_COMMENT_BEGIN-->/,/<!--WIERD_SPECIAL_COMMENT_END-->/
#Do something to add the lines in this range to pattern buffer
/.*[this].*/d
#Delete all the lines if [this] is in the pattern buffer
' <in.txt >out.txt
With Perl, it’s relatively simple.
The benefits offered by Perl are (a) the
-0777“slurp mode” which pulls in the entire input file in one go, instead ofsed‘s line-at-a-time processing; (b) the/sregex flag which allows for dot to match a newline; (c) the stingy repetition operators*?and friends, which causes the repetition to match as little as possible instead of as much as possible; and finally (d) the negative lookahead(?!...)which allows you to inhibit matching where the negative lookahead expression matches. (Without this, even stingy matching would match across an end delimiter if there was a “false” starting delimiter in the “stuff before” text.) … And of course, (e) a general-purpose programming language wheresedis only suitable for relatively simple text processing tasks.(I used simpler beginning and ending delimiters. I hope “wierd” was an intentional misspelling.)