In the below code, I'm using a single function that…

Question

0

Asked: May 12, 20262026-05-12T21:05:23+00:00 2026-05-12T21:05:23+00:00

To prefix unique words with UNIQUE: inside a file I’ve tried to use a

0

To prefix unique words with “UNIQUE:” inside a file I’ve tried to use a perl regex command like:

perl -e 'undef $/;while($_=<>){s/^(((?!\b\3\b).)*)\b(\w+)\b(((?!\b\3\b).)*)$/\1UNIQUE:\3\4/gs;print $_;}' demo

On a demo file containing:

watermelon banana
apple pear pineapple orange mango
strawberry cherry
kiwi pineapple lemon cranberry watermelon
orange plum cherry
kiwi banana plum
mango cranberry apple
lemon

The output is:

watermelon banana
apple pear pineapple orange mango
strawberry cherry
kiwi pineapple lemon cranberry watermelon
orange plum cherry
kiwi banana plum
mango cranberry apple
UNIQUE:lemon

Unfortunately, the \3 backreference doesn’t seem to be handled if used in advance.

Is there another way to achieve this with another regex or with other usual commands available on a Linux box? (grep, sed, awk,…)

Many thanks

EDIT:
Unfortunately, many of the solutions works for the provided case only which was incomplete, my apologies for that, it should also work on a text like:

{watermelon || banana}
apple = ( pear pineapple orange mango )
strawberry cherry
kiwi = pineapple = lemon = cranberry = watermelon
orange - plum = cherry
kiwi = banana + plum
mango = cranberry && apple
lemon

If it simplifies the problem, words may be prefixed with something like $ or @.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T21:05:23+00:00

I see you are already using Perl. When you want to count something using a hash is always a nice approach…

#!/usr/bin/perl -w
use strict;

my %hash = ();
my $str;

while(<>) {
    $str .= $_;
    $_ =~ s/\W+/ /g;
    map {$hash{$_}++} split ' ', $_;
}

for (keys %hash){
    my $word = $_;
    if($hash{$word}==1) {
        $str =~ s/\($word)/UNIQUE:$word/g;
    }
};

print "$str\n";

which will output:

{watermelon || banana}
apple = ( UNIQUE:pear pineapple orange mango )
UNIQUE:strawberry cherry
kiwi = pineapple = lemon = cranberry = watermelon
orange - plum = cherry
kiwi = banana + plum
mango = cranberry && apple
lemon

Using an regexp is probably going to be hard. You need to run through the entire file twice. One pass to count all occurrences of words and one pass to mark-up the unique words.

The above snippet read the input once, but keeps the entire original text in $str – obviously a bad idea if the input was large.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions