I want to perform about many find and replace operations on some text. I

Question

0

Asked: May 26, 20262026-05-26T15:02:50+00:00 2026-05-26T15:02:50+00:00

I want to perform about many find and replace operations on some text. I

0

I want to perform about many find and replace operations on some text. I have a UTF-8 CSV file containing what to find (in the first column) and what to replace it with (in the second column), arranged from longest to shortest.

E.g.:

orange,fruit2
carrot,vegetable1
apple,fruit3
pear,fruit4
ink,item1
table,item2

Original file:

"I like to eat apples and carrots"

Resulting output file:

"I like to eat fruit3s and vegetable1s."

However, I want to ensure that if one part of text has already been replaced, that it doesn’t mess with text that was already replaced. In other words, I don’t want it to appear like this (it matched “table” from within vegetable1):

"I like to eat fruit3s and vegeitem21s."

Currently, I am using this method which is quite slow, because I have to do the whole find and replace twice:

(1) Convert the CSV to three files, e.g.:

a.csv     b.csv   c.csv
orange    0001    fruit2
carrot    0002    vegetable1
apple     0003    fruit3
pear      0004    fruit4
ink       0005    item1
table     0006    item 2

(2) Then, replace all items from a.csv in file.txt with the matching column in b.csv, using ZZZ around the words to make sure there is no mistake later in matching the numbers:

a=1
b=`wc -l < ./a.csv`
while [ $a -le $b ]
do
    for i in `sed -n "$a"p ./b.csv`; do
        for j in `sed -n "$a"p ./a.csv`; do
            sed -i "s/$i/ZZZ$j\ZZZ/g" ./file.txt
            echo "Instances of '"$i"' replaced with '"ZZZ$j\ZZZ"' ("$a"/"$b")."
            a=`expr $a + 1`
            done
    done
done

(3) Then running this same script again, but to replace ZZZ0001ZZZ with fruit2 from c.csv.

Running the first replacement takes about 2 hours, but as I must run this code twice to avoid editing the already replaced items, it takes twice as long. Is there a more efficient way to run a find and replace that does not perform replacements on text already replaced?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:02:51+00:00

One way to do it would be to do a two-phase replace:

phase 1:

s/orange/@@1##/
s/carrot/@@2##/
...

phase 2:
s/@@1##/fruit2/
s/@@2##/vegetable1/
...

The @@1## markers should be chosen so that they don’t appear in the original text or the replacements of course.

Here’s a proof-of-concept implementation in perl:

#!/usr/bin/perl -w
#

my $repls = $ARGV[0];
die ("first parameter must be the replacement list file") unless defined ($repls);
my $tmpFmt = "@@@%d###";

open(my $replsFile, "<", $repls) || die("$!: $repls");
shift;

my @replsList;

my $i = 0;
while (<$replsFile>) {
    chomp;
    my ($from, $to) = /\"([^\"]*)\",\"([^\"]*)\"/;
    if (defined($from) && defined($to)) {
        push(@replsList, [$from, sprintf($tmpFmt, ++$i), $to]);
    }
}

while (<>) {
    foreach my $r (@replsList) {
        s/$r->[0]/$r->[1]/g;
    }
    foreach my $r (@replsList) {
        s/$r->[1]/$r->[2]/g;
    }
    print;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to perform about many find and replace operations on some text. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply