Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8375387
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T15:09:47+00:00 2026-06-09T15:09:47+00:00

I have a perl script that processes a text file line-by-line and converts phrases

  • 0

I have a perl script that processes a text file line-by-line and converts phrases within those lines to links (specifically in mediawiki mark-up, but I suspect any mark-up would have the same issue). Where I get stuck is when one phrase is a subset of another. In these cases too many links are created.

For example, if “General Committee” and “Annual General Committee Meeting” are two of the phrases:

The General Committee meeting shall meet once a month.

is converted correctly to:

The [[#GC|General Committee]] meeting shall meet one a month.

However,

The Annual General Committee Meeting shall be held in May.

is incorrectly converted to:

The [[#AGCM|Annual [[#GC|General Committee]] Meeting]] shall be held in May.

That is, my script is finding the phrase “General Committee” within “Annual General Committee Meeting” and inserting a link where I don’t want it. There should only be a link to the AGCM in this example.

The relevant perl code is:

my($line) = $_;
foreach $phrase (keys(%phrases))  # the phrases to replace mapped to their links
{
    my($link) = $phrases{$phrase};
    if ($line =~ m/$phrase/)
    {
        $line =~ s/$phrase/[[#$link|$phrase]]/g;
    }
}

Any suggestions on how to avoid matching / substituting when one phrase can be found with another?

UPDATE: Clarification based on some of the questions: Each phrase stands alone; there is no priority of one over another. Taking the longest over the shortest is sufficient to get what I need.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T15:09:48+00:00Added an answer on June 9, 2026 at 3:09 pm

    You should build a regular expression that matches any of the hash keys in one comparison.

    This program shows the idea. The keys are sort by decreasing length so that the longest match is found first, and then concatenated with the | alternation character as a separator.

    Then it is simply a matter of finding all occurrences of the built pattern and replacing it with the corresponding hash element value. This can be done in a single substitution instead of needing a loop.

    Note that you may want to consider interposing a map to use \s+ in place of whitespace, and perhaps put \b before and after the strings to ensure that the string matched isn’t part of a longer word. Also the /i regex modifier may be relevant to allow case-independent matching.

    use strict;
    use warnings;
    
    my %phrases = (
      'General Committee' => '[[#GC|General Committee]]',
      'Annual General Committee Meeting' => '[[#AGCM|Annual General Committee Meeting]]',
    );
    
    my $text = <<END;
    The General Committee meeting shall meet once a month.
    The Annual General Committee Meeting shall be held in May.
    END
    
    my $regex = join '|', sort { length $b <=> length $a } keys %phrases;
    
    $text =~ s/($regex)/$phrases{$1}/g;
    
    print $text, "\n";
    

    output

    The [[#GC|General Committee]] meeting shall meet once a month.
    The [[#AGCM|Annual General Committee Meeting]] shall be held in May.
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a perl script that processes millions of lines of performance data, so
I have a Perl script that processes a bunch of file names, and uses
I have matlab function that calls a Perl script which converts a large text
I have a simple Perl script that simply prints a line of text to
I have a Perl script that uses WWW::Mechanize to read from a file and
I have a Perl script that takes text values from a MySQL table and
I have a perl script that's reading an INI file like this: [placeholder_title] Hostname
On Computer A (my computer), I have a Perl script that accesses a file
I have a Perl script that outputs text. I want to import this text
I have a Perl script that reads a simple .csv file like below- header1,header2,header3,header4

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.