I have a large (600 odd) set of search and replace terms that I need to run as a sed script over some files. The problem is that the search terms are NOT orthogonal… but I think I can get away with it by sorting by line length (i.e. pull out the longest matches first, and then alphabetically within each length. So given an unsort set of:
aaba
aa
ab
abba
bab
aba
what I want is a sorted set such as:
abba
aaba
bab
aba
ab
aa
Is there a way of doing it by say prepending the line lenght and sorting by a field?
For bonus marks 🙂 !!!
The search and replace is actually simply a case of replacing
term
with
_term_
and the sed code I was going to use was
s/term/_term_/g
How would I write the regex to avoid replacing terms already within _ pairs?
You could compact it all into one regexp:
If I understand your question correctly, this will solve all your problems: No “double replacement” and always matching the longest word.