Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7703515
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T23:28:38+00:00 2026-05-31T23:28:38+00:00

In a few different guises I’ve asked about this filter on here and WPSE.

  • 0

In a few different guises I’ve asked about this “filter” on here and WPSE. I’m now taking a different approach to it, and I’d like to make it solid and reliable.

My situation:

  • When I create a post in my WordPress CMS, I want to run a filter which searches for certain terms and replaces them with links.

  • I have the terms that I want to search for in two arrays: $glossary_terms and $species_terms.

  • $species_terms is a list of scientific names of fishes, such as Apistogramma panduro.

  • $glossary_terms is a list of fishkeeping glossary terms such as abdomen, caudal-fin and Gram's Method.

There are a few nuances worth noting:

  • Speed is not an issue, as I will be running this filter in the background rather than when a user visits the page or whan an author submits/edits a species profile or post.

  • Some of the post content being filtered may contain HTML with these terms in, like <img src="image.jpg" title="Apistogramma panduro male" />. Obviously these shouldn’t be replaced.

  • Species are often referred to with an abbreviated Genus, so instead of Apistogramma panduro, you’ll often see A. panduro. This means I need to search & replace all of the species terms as an abbreviation too – Apistogramma panduro, A. panduro, Satanoperca daemon, S. daemon etc.

  • If caudal-fin and caudal both exist in the glossary terms, caudal-fin should be replaced first.

I was contemplating simply adding a preg_replace which searched for the terms, but only with a space on the left, (i.e. ( )term) and a space, comma, exclamation, full-stop or hyphen on the right (i.e. term(, . ! - )) but that won’t help me to not break the image HTML.


Example content

<br />
It looks very similar to fishes of the <i><a href="species/betta-foerschi" rel="species/betta-foerschi/?hover=true" class="link_species">B. foerschi</a></i> group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that <a href="glossary/a/assemblage" rel="glossary/a/assemblage?hover=true" class="link_glossary">assemblage</a>.

Instead it appears to be a member of the <i><a href="species/betta-coccina" rel="species/betta-coccina/?hover=true" class="link_species">B. coccina</a></i> group which currently includes <i><a href="species/betta-brownorum" rel="species/betta-brownorum/?hover=true" class="link_species">B. brownorum</a></i>, <i><a href="species/betta-burdigala" rel="species/betta-burdigala/?hover=true" class="link_species">B. burdigala</a></i>, <i><a href="species/betta-coccina" rel="species/betta-coccina/?hover=true" class="link_species">B. coccina</a></i>, <i><a href="species/betta-livida" rel="species/betta-livida/?hover=true" class="link_species">B. livida</a></i>, <i>B. miniopinna</i>, <i><a href="species/betta-persephone" rel="species/betta-persephone/?hover=true" class="link_species">B. persephone</a></i>, <i>B. tussyae</i>, <i><a href="species/betta-rutilans" rel="species/betta-rutilans/?hover=true" class="link_species">B. rutilans</a></i> and <i><a href="species/betta-uberis" rel="species/betta-uberis/?hover=true" class="link_species">B. uberis</a></i>.

Of these it's most similar in appearance to <i><a href="species/betta-uberis" rel="species/betta-uberis/?hover=true" class="link_species">B. uberis</a></i> but can be distinguished by its noticeably shorter <a href="glossary/d/dorsal" rel="glossary/d/dorsal?hover=true" class="link_glossary">dorsal</a>-<a href="glossary/f/fin" rel="glossary/f/fin?hover=true" class="link_glossary">fin</a> <a href="glossary/b/base" rel="glossary/b/base?hover=true" class="link_glossary">base</a> and overall blue-greenish (vs. green/reddish) colouration.

Members of this group are characterised by their small adult size (&lt; 40 mm SL), a uniform red or black <a href="glossary/b/base" rel="glossary/b/base?hover=true" class="link_glossary">base</a> body colour, the presence of a <a href="glossary/m/midlateral" rel="glossary/m/midlateral?hover=true" class="link_glossary">midlateral</a> body blotch in some <a href="glossary/s/species" rel="glossary/s/species?hover=true" class="link_glossary">species</a> and the fact they have 9 abdominal <a href="glossary/v/vertebrae" rel="glossary/v/vertebrae?hover=true" class="link_glossary">vertebrae</a> compared with 10-12 in the other <a href="glossary/s/species" rel="glossary/s/species?hover=true" class="link_glossary">species</a> groups. In addition all are <a href="glossary/o/obligate" rel="glossary/o/obligate?hover=true" class="link_glossary">obligate</a> <a href="glossary/p/peat" rel="glossary/p/peat?hover=true" class="link_glossary">peat</a> <a href="glossary/s/swamp" rel="glossary/s/swamp?hover=true" class="link_glossary">swamp</a> dwellers (Tan and Ng, 2005).<br />

^^^ This example here has had the correct links manually inserted. The filter shouldn’t break these links!

It looks very similar to fishes of the B. foerschi group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that assemblage.

Instead it appears to be a member of the B. coccina group which currently includes B. brownorum, B. burdigala, B. coccina, B. livida, B. miniopinna, B. persephone, B. tussyae, B. rutilans and B. uberis.

Of these it's most similar in appearance to B. uberis but can be distinguished by its noticeably shorter dorsal-fin base and overall blue-greenish (vs. green/reddish) colouration.

Members of this group are characterised by their small adult size (< 40 mm SL), a uniform red or black base body colour, the presence of a midlateral body blotch in some species and the fact they have 9 abdominal vertebrae compared with 10-12 in the other species groups. In addition all are obligate peat swamp dwellers (Tan and Ng, 2005).

^^^ Same example pre-formatting.

<a href="http://www.seriouslyfish.comwp-content/uploads/2011/12/Amazonas-English-1.jpg"><img class="size-thumbnail wp-image-542" title="Amazonas English" src="/wp-content/uploads/2011/12/Amazonas-English-1-288x381.jpg" alt="Amazonas English" width="125" height="165" /></a>

Amazonas Magazine - now in English!

Edited by Hans-Georg Evers, the magazine 'Amazonas' has been widely-regarded as among the finest regular publications in the hobby since its launch in 2005, an impressive achievment considering it's only been published in German to date. The long-awaited English version is just about to launch, and we think a subscription should be top of any serious fishkeeper's Xmas list... The magazine is published in a bi-monthly basis and the English version launches with the January/February 2012 issue with distributors already organised in the United States, Canada, the United Kingdom, South Africa, Australia, and New Zealand. There are also mobile apps availablen which allow digital subscribers to read on portable devices. It's fair to say that there currently exists no better publication for dedicated hobbyists with each issue featuring cutting-edge articles on fishes, invertebrates, aquatic plants, field trips to tropical destinations plus the latest in husbandry and breeding breakthroughs by expert aquarists, all accompanied by excellent photography throughout. U.S. residents can subscribe to the printed edition for just $29 USD per year, which also includes a free digital subscription, with the same offer available to Canadian readers for $41 USD or overseas subscribers for $49 USD. Please see the <a href="http://www.amazonasmagazine.com/">Amazonas website</a> for further information and a sample digital issue! Alternatively, subscribe directly to the print version <a href="https://www.amazonascustomerservice.com/subscribe/index2.php">here</a> or digital version <a href="https://www.amazonascustomerservice.com/subscribe/digital.php">here</a>.

^^^ This will likely only have a few Glossary terms in rather than any species links.


Example terms

$species_terms

339 => 'Aulonocara maylandi maylandi',
340 => 'Aulonocara maylandi kandeensis',
341 => 'Aulonocara sp. "walteri"',
342 => 'Aulonocara sp. "stuartgranti maleri"',
343 => 'Aulonocara stuartgranti',
344 => 'Benthochromis tricoti',
345 => 'Boulengerochromis microlepis',
346 => 'Buccochromis lepturus',
347 => 'Buccochromis nototaenia',
348 => 'Betta brownorum',
349 => 'Betta foerschi',
350 => 'Betta coccina',
351 => 'Betta uberis'

As you can see above, the general format for these scientific names is “Genus species”, but can often include “sp.” or “aff.” (for species which aren’t officially described) and “Genus species subspecies” formats.

$glossary_terms

1 => 'abdomen',
2 => 'caudal',
3 => 'caudal-fin',
4 => 'caudal-fin peduncle',
5 => 'Gram\'s Method'

If anyone can come up with a filter which meets all these conditions and requirements, I’d like to offer a bounty.

Thanks in advance,

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T23:28:39+00:00Added an answer on May 31, 2026 at 11:28 pm

    I think it’s better to use DOMDocument functionality than regexps. Here is a working prototype:

    // Each dynamically constructed regexp will contain at most 70 subpatterns
    define('GROUPS_PER_REGEXPS', 70);
    
    $speciesTerms = array(
      339 => '(?:Aulonocara|A\.) maylandi maylandi',
      340 => '(?:Aulonocara|A\.) maylandi kandeensis',
      344 => '(?:Benthochromis|B\.) tricoti',
      345 => '(?:Boulengerochromis|B\.) microlepis',
    );
    
    function matchTerms($text) {
      // Globals are not good. I left it for the simplicity
      global $speciesTerms;
    
      $result = array();
      $t = 0;
      $speciesCount = count($speciesTerms);
      reset($speciesTerms);
      while ($t < $speciesCount) {
        // Maps capturing group identifiers to term ids
        $termMapping = array();
    
        // Dynamically construct regexp
        $groups = '';
        $c = 1;
        while (list($termId, $termPattern) = each($speciesTerms)) {
          if (!empty($groups)) {
            $groups .= '|';
          }
          // Match word boundaries, so we don't capture "B. tricotisomeramblingstring"
          $groups .= '(\b' . $termPattern . '\b)';
          $termMapping[$c++] = $termId;
          if (++$t % GROUPS_PER_REGEXPS == 0) {
            break;
          }
        }
        $regexp = "/$groups/m";
        preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
        for ($i = 1; $i < $c; $i++) {
          foreach ($matches[$i] as $matchData) {
            // matchData[0] holds matched string, e.g. Benthochromis tricoti
            // matchData[1] holds offset, e.g. 15
            if (isset($matchData[0]) && !empty($matchData[0])) {
              $result[] = array(
                'text' => $matchData[0],
                'offset' => $matchData[1],
                'id' => $termMapping[$i],
              );
            }
          }
        }
      }
      // Sort by offset in descending order
      usort($result, function($a, $b) {
        return $a['offset'] > $b['offset'] ? -1 : 1;
      });
      return $result;
    }
    
    $doc = DOMDocument::loadHTML($html);
    
    // Stack will be used to avoid recursive functions
    $stack = new SplStack;
    $stack->push($doc);
    while (!$stack->isEmpty()) {
      $node = $stack->pop();
      if ($node->nodeType == XML_TEXT_NODE && $node->parentNode instanceof DOMElement) {
        // $node represents text node
        //  and it's inside a tag (second condition in the statement above)
    
        // Check that this text is not wrapped in <a> tag
        //  as we don't want to wrap it twice
        if ($node->parentNode->tagName != 'a') {
          $matches = matchTerms($node->wholeText);
          foreach ($matches as $match) {
            // Create new link element in the DOM
            $link = $doc->createElement('a', $match['text']);
            $link->setAttribute('href', 'species/' . $match['id']);
            $link->setAttribute('class', 'link_species');
    
            // Save the text after the link
            $remainingText = $node->splitText($match['offset'] + strlen($match['text']));
            // Save the text before the link
            $linkText = $node->splitText($match['offset']);
    
            // Replace $linkText with $link node
            //  i.e. 'something' becomes '<a href="..">something</a>'
            $node->parentNode->replaceChild($link, $linkText);
          }
        }
      }
      if ($node->hasChildNodes()) {
        foreach ($node->childNodes as $childNode) {
          $stack->push($childNode);
        }
      }
    }
    
    $body = $doc->getElementsByTagName('body');
    echo $doc->saveHTML($body->item(0));
    

    Implementation details

    I’ve only showed how to replace species terms, glossary terms will be same. Links are formed in form “species/$id”. Abbreviations are handled correctly. DOMDocument is a very reliable parser, it can deal with broken markup and is fast.

    ?: in regexp allows not to count this subpattern as a capturing group (documentation on subpatterns). Without proper counting of subpatterns, we can’t retrieve the termId. The idea is that we build a big regexp pattern by joining all regexps specified in $speciesTerms array and separating them with a pipe |. Final regexp for the first two species would be (spaces for clarity):

           First capturing group             Alternation       Second capturing group
    ( (?:Aulonocara|A\.) maylandi maylandi )      |       ( (?:Aulonocara|A\.) maylandi kandeensis )
    

    So, the text “Examples: Aulonocara maylandi maylandi, A. maylandi kandeensis” will give following matches:

    $matches[1] = array('Aulonocara maylandi maylandi') // Captured by the first group
    $matches[2] = array('A. maylandi kandeensis') // Captured by the second group
    

    We can clearly say that all elements in matches[1] are referring to the species Aulonocara maylandi maylandi or A. maylandi maylandi which has id = 339.

    In short: Use (?:) if you’re using subpatterns in $speciesTerms.

    UPDATE
    Each dynamically created regexp has a limit on maximal number of subpatterns, which is defined as a const at the top. This allows avoiding PCRE limit on number of subpatterns in regexp.

    Important notes:

    • If you have a lot of terms you should rewrite matchTerms, because regexp has a limit on a number of subpatterns. In this case it’s optimal to prebuild array of regexps out of every N terms.
    • matchTerms generates regexp at every call, obviously it can be done only once
    • It’s possible to use advanced regexps in speciesTerms
    • strlen => mb_strlen if you’re using multibyte encodings
    • Supplied $html will be wrapped in a <body> tag (unless it’s already wrapped)
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know that this question has been asked many times before in different guises
I have few different applications among which I'd like to share a C# enum.
I've been trying this a few different ways, but I'm reaching the conclusion that
I have become pretty fluent in a few different languages now, but I seem
I have tried a few different methods, like print(boolean isLeapYear) and a few others,
I have a system with a few different databases, and I would like to
I've been reading around a few different guides/tutorials on this topic and found the
Cucumber has a few different hook methods like Before, After or AfterStep. I was
I've tried this a few different ways and all the code does is print
I've seen a few different versions of this question but having difficulty applying it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.