Possible Duplicate: How to parse and process HTML with PHP? I need to parse

Question

0

Asked: June 12, 20262026-06-12T07:44:57+00:00 2026-06-12T07:44:57+00:00

Possible Duplicate: How to parse and process HTML with PHP? I need to parse

0

Possible Duplicate:
How to parse and process HTML with PHP?

I need to parse blocks of HTML, replacing some hrefs with the link description based on whether the description meets certain criteria.

The regex I’m using to identify specific strings is used elsewhere in my application:

$regex  = "/\b[FfGg][\.][\s][0-9]{1,4}\b/";
preg_match_all($regex, $html, $matches, PREG_SET_ORDER);

I’m using the following SO question as a starting point for extracting href descriptions:

Replacing html link tags with a text description

The idea is to convert any link having a “FfGg.xxxx” type identifier, and leave the rest in tact (ie, the google link).

What I have so far is:

    $html = 'Ten reports <a href="http://google.com">Google!</a> on 14 mice with ABCD 
show that low plasma BCAA, particularly ABC and to a lesser extent DEF, can result in 
severe but reversible epithelial damage to the skin, eye and gastrointestinal tract.
</li><li>Symptoms were reported in conjunction with low plasma ABC levels in 9 case 
reports. In two case reports, ABC levels were between 1.9 and 48 µmol/L (<a 
href="/docpage.php?obscure==100" target="F.100">F.100</a>, <a 
href="/docpage.php?obscure==68" target="F.68">F.68</a>, <a href="/docpage.php?obscure==67" 
target="F.67">F.67</a>, <a href="/docpage.php?obscure==71" target="F.71">F.71</a>, <a 
href="/docpage.php?obscure==122" target="F.122">F.122</a>, <a 
href="/docpage.php?obscure==92" target="F.92">F.92</a>, <a href="/docpage.php?obscure==96" 
target="F.96">F.96</a>);';

This converts all links, including google:

$html = preg_replace("/<a.*?href=\"(.*?)\".*?>(.*?)<\/a>/i", "$2", $html);

This returns a blank HTML string:

$html = preg_replace("/<a.*?href=\"(.*?)\".*?>[FfGg][\.][\s][0-9]{1,4}<\/a>/i", "$2", $html);

I believe the problem is in how I’m embedding this regex in the second (non-working) example above:

[FfGg][\.][\s][0-9]{1,4}

What is the correct way of embedding the FfGg expression in HTML found in my preg_replace example above?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T07:44:59+00:00

Here is the DOM (correct) way to do it:

EDIT: Improved regex

<?php

    $html = 'Ten reports <a href="http://google.com">Google!</a> on 14 mice with ABCD show that low plasma BCAA, particularly ABC and to a lesser extent DEF, can result in severe but reversible epithelial damage to the skin, eye and gastrointestinal tract.</li><li>Symptoms were reported in conjunction with low plasma ABC levels in 9 case reports. In two case reports, ABC levels were between 1.9 and 48 µmol/L (<a href="/docpage.php?obscure==100" target="F.100">F.100</a>, <a href="/docpage.php?obscure==68" target="F.68">F.68</a>, <a href="/docpage.php?obscure==67" target="F.67">F.67</a>, <a href="/docpage.php?obscure==71" target="F.71">F.71</a>, <a href="/docpage.php?obscure==122" target="F.122">F.122</a>, <a href="/docpage.php?obscure==92" target="F.92">F.92</a>, <a href="/docpage.php?obscure==96" target="F.96">F.96</a>);';

    // Create a new DOMDocument and load the HTML string
    $dom = new DOMDocument('1.0');
    $dom->loadHTML($html);

    // Create an XPath object for this DOMDocument
    $xpath = new DOMXPath($dom);

    // Loop over all <a> elements in the document
    // Ideally we would combine the regex into the XPath query, but XPath 1.0
    // doesn't support it
    foreach ($xpath->query('//a') as $anchor) {
        // See if the link matches the pattern
        if (preg_match('/^\s*[gf]\s*\.\s*\d{1,4}\s*$/i', $anchor->nodeValue)) {
            // If it does, convert it to a text node (effectively, un-linkify it)
            $textNode = new DOMText($anchor->nodeValue);
            $anchor->parentNode->replaceChild($dom->importNode($textNode), $anchor);
        }
    }

    // Because you are working with partial HTML string, I extract just that
    // string. If you are actually working with a full document, you can
    // replace all the code below this comment with simply:
    // $result = $dom->saveHTML();

    // A string to hold the result
    $result = '';

    // Iterate all elements that are a direct child of the <body> and convert
    // them to strings
    foreach ($xpath->query('/html/body/*') as $node) {
        $result .= $node->C14N();
    }

    // $result now contains the modified HTML string

See it working (NB: the error message you see is because the HTML string you supplied is not valid)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Possible Duplicate: How to parse and process HTML with PHP? I need to parse

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply