After seeing several threads rubbishing the regexp method of finding a term to match

Question

0

Asked: May 14, 20262026-05-14T18:42:11+00:00 2026-05-14T18:42:11+00:00

After seeing several threads rubbishing the regexp method of finding a term to match

0

After seeing several threads rubbishing the regexp method of finding a term to match within an HTML document, I’ve used the Simple HTML DOM PHP parser (http://simplehtmldom.sourceforge.net/) to get the bits of text I’m after, but I want to know if my code is optimal. It feels like I’m looping too many times. Is there a way to optimise the following loop?

//Get the HTML and look at the text nodes
   $html = str_get_html($buffer);
   //First we match the <body> tag as we don't want to change the <head> items
   foreach($html->find('body') as $body) {
    //Then we get the text nodes, rather than any HTML
    foreach($body->find('text') as $text) {
     //Then we match each term
     foreach ($terms as $term) {
      //Match to the terms within the text nodes
      $text->outertext = str_replace($term, '<span class="highlight">'.$term.'</span>', $text->outertext);
     }       
    }
   }

For example, would it make a difference to determine check if I have any matches before I start the loop maybe?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T18:42:12+00:00

Editorial Team

2026-05-14T18:42:12+00:00Added an answer on May 14, 2026 at 6:42 pm

You don’t need the outer foreach loop; there’s generally only one body tag in a well-formed document. Instead, just use $body = $html->find('body',0);.

However, since a loop with only a single iteration is essentially equivalent in run time to not looping at all, it probably won’t have much performance impact either way. So in reality, you really just have 2 nested loops even in your original code, not 3.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

After seeing several threads rubbishing the regexp method of finding a term to match

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply