I frequently need to make many replacements within files. To solve this problem, I have created two files old.text and new.text. The first contains a list of words which must be found. The second contains the list of words which should replace those.
- All of my files use UTF-8 and make use of various languages.
I have built this script, which I hoped could do the replacement. First, it reads old.text one line at a time, then replaces the words at that line in input.txt with the corresponding words from the new.text file.
#!/bin/sh
number=1
while read linefromoldwords
do
echo $linefromoldwords
linefromnewwords=$(sed -n '$numberp' new.text)
awk '{gsub(/$linefromoldwords/,$linefromnewwords);print}' input.txt >> output.txt
number=$number+1
echo $number
done < old.text
However, my solution does not work well. When I run the script:
- On line 6, the
sedcommand does not know where the$numberends. - The
$numbervariable is changing to “0+1”, then “0+1+1”, when it should change to “1”, then “2”. - The line with
awkdoes not appear to be doing anything more than copying the input.txt exactly as is to output.txt.
Do you have any suggestions?
Update:
The marked answer works well, however, I use this script a lot and it takes many hours to finish. So I offer a bounty for a solution which can complete these replacements much quicker. A solution in BASH, Perl, or Python 2 will be okay, provided it is still UTF-8 compatible. If you think some other solution using other software commonly available on Linux systems would be faster, then that might be fine too, so long as huge dependencies are not required.
Try quoting the variable with double quotes
Do this instead:
awk won’t take variables outside its scope. User defined variables in awk needs to be either defined when they are used or predefined in the awk’s BEGIN statement. You can include shell variables by using
-voption.Here is a solution in
bashthat would do what you need.Bash Solution:
This solution reads one line at a time from
substitution fileandreplacement fileand performsin-line sedsubstitution.