I have some plain text and html. I need to create a PHP method

Question

0

Asked: May 14, 20262026-05-14T20:20:25+00:00 2026-05-14T20:20:25+00:00

I have some plain text and html. I need to create a PHP method

0

I have some plain text and html. I need to create a PHP method that will return the same html, but with <span class="marked"> before any instances of the text and </span> after it.

Note, that it should support tags in the html (for example if the text is blabla so it should mark when it’s bla<b>bla</b> or <a href="http://abc.com">bla</a>bla.

It should be incase sensitive and support long text (with multilines etc) either.

For example, if I call this function with the text “my name is josh” and the following html:

<html>
<head>
    <title>My Name Is Josh!!!</title>
</head>
<body>
    <h1>my name is <b>josh</b></h1>
    <div>
        <a href="http://www.names.com">my name</a> is josh
    </div>

    <u>my</u> <i>name</i> <b>is</b> <span style="font-family: Tahoma;">Josh</span>.
</body>
</html>

… it should return:

<html>
<head>
    <title><span class="marked">My Name Is Josh</span>!!!</title>
</head>
<body>
    <h1><span class="marked">my name is <b>josh</b></span></h1>
    <div>
        <span class="marked"><a href="http://www.names.com">my name</a> is josh</span>
    </div>

    <span class="marked"><u>my</u> <i>name</i> <b>is</b> <span style="font-family: Tahoma;">Josh</span></span>.
</body>
</html>

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T20:20:26+00:00

This is going to be tricky.

Whilst you could do it with simple regex hacking, ignoring anything inside a tag, something like the naïve:

preg_replace(
    'My(<[^>]>)*\s+(<[^>]>)*name(<[^>]>)*\s+(<[^>]>)*is(<[^>]>)*\s+(<[^>]>)*Josh',
    '<span class="marked">$0</span>', $html
)

that’s not at all reliable. Partly because HTML can’t be parsed with regex: it’s valid to put > in an attribute value, and other non-element constructs like comments will be mis-parsed. Even with a more rigorous expression to match tags — something horribly unwieldy like <[^>\s]*(\s+([^>\s]+(\s*=\s*([^"'\s>][\s>]*|"[^"]*"|'[^']*')\s*))?)*\s*\/?>, you’d still have many of the same problems, especially if the input HTML is not guaranteed valid.

This could even be a security issue, as if the HTML you are processing is untrusted, it could fool your parser into turning text content into attributes, resulting in script injection.

But even ignoring that, you wouldn’t be able to ensure proper element nesting. So you might turn:

<em>My name is <strong>Josh</strong>!!!</em>

into the misnested and invalid:

<span class="marked"><em>My name is <strong>Josh</strong></span>!!!</em>

or:

My
<table><tr><td>name is</td></tr></table>
Josh

where those elements can’t be wrapped with a span. If you’re unlucky, the browser fixups to ‘correct’ your invalid output could end up leaving half the page ‘marked’, or messing up the page layout.

So you would have to do this on a parsed-DOM level rather than with string hacking. You could parse the whole string in using PHP, process it and re-serialise, but if it’s acceptable from an accessibility point of view, it would probably be easier to do it at the browser end in JavaScript, where the content is already parsed into DOM nodes.

It’s still going to be pretty hard. This question handles it where the text will all be inside the same text node, but that’s a much simpler case.

What you would effectively have to do would be:

for each Element that may contain a <span>:
    for each child node in the element:
       generate the text content of this node and all following siblings
       match the target string/regex against the whole text
       if there is no match:
           break the outer loop - on to the next element.
       if the current node is an element node and the index of the match is not 0:
           break the inner loop - on to the next sibling node
       if the current node is a text node and the index of the match is > the length of the Text node data:
           break the inner loop - on to the next sibling node
       // now we have to find the position of the end of the match
       n is the length of the match string
       iterate through the remaining text node data and sibling text content:
           compare the length of the text content with n
           less?:
               subtract length from n and continue
           same?:
               we've got a match on a node boundary
               split the first text node if necessary
               insert a new span into the document
               move all the nodes from the first text node to this boundary inside the span
               break to outer loop, next element
           greater?:
               we've got a match ending inside the node.
               is the node a text node?:
                   then we can split the text node
                   also split the first text node if necessary
                   insert a new span into the document
                   move all contained nodes inside the span
                   break to outer loop, next element
               no, an element?:
                   oh dear! We can't insert a span here

Ouch.

Here’s an alternative suggestion which is slightly less nasty, if it’s acceptable to wrap every text node that is part of a match separately. So:

<p>Oh, my</p> name <div><div>is</div><div> Josh

would leave you with the output:

<p>Oh, <span class="marked">my</span></p>
<span class="marked"> name </span>
<div><div><span class="marked">is</span></div></div>
<span class="marked"> Josh</span>

which might look OK, depending on how you’re styling the matches. It would also solve the misnesting problem of matches partially inside elements.

ETA: Oh sod the pseudocode, I’ve more-or-less written the code now anyway, might as well finish it. Here’s a JavaScript version of the latter approach:

markTextInElement(document.body, /My\s+name\s+is\s+Josh/gi);


function markTextInElement(element, regexp) {
    var nodes= [];
    collectTextNodes(nodes, element);
    var datas= nodes.map(function(node) { return node.data; });
    var text= datas.join('');

    // Get list of [startnodei, startindex, endnodei, endindex] matches
    //
    var matches= [], match;
    while (match= regexp.exec(text)) {
        var p0= getPositionInStrings(datas, match.index, false);
        var p1= getPositionInStrings(datas, match.index+match[0].length, true);
        matches.push([p0[0], p0[1], p1[0], p1[1]]);
    }

    // Get list of nodes for each match, splitted at the edges of the
    // text. Reverse-iterate to avoid the splitting changing nodes we
    // have yet to process.
    //
    for (var i= matches.length; i-->0;) {
        var ni0= matches[i][0], ix0= matches[i][1], ni1= matches[i][2], ix1= matches[i][3];
        var mnodes= nodes.slice(ni0, ni1+1);
        if (ix1<nodes[ni1].length)
            nodes[ni1].splitText(ix1);
        if (ix0>0)
            mnodes[0]= nodes[ni0].splitText(ix0);

        // Replace each text node in the sublist with a wrapped version
        //
        mnodes.forEach(function(node) {
            var span= document.createElement('span');
            span.className= 'marked';
            node.parentNode.replaceChild(span, node);
            span.appendChild(node);
        });
    }
}

function collectTextNodes(texts, element) {
    var textok= [
        'applet', 'col', 'colgroup', 'dl', 'iframe', 'map', 'object', 'ol',
        'optgroup', 'option', 'script', 'select', 'style', 'table',
        'tbody', 'textarea', 'tfoot', 'thead', 'tr', 'ul'
    ].indexOf(element.tagName.toLowerCase()===-1)
    for (var i= 0; i<element.childNodes.length; i++) {
        var child= element.childNodes[i];
        if (child.nodeType===3 && textok)
            texts.push(child);
        if (child.nodeType===1)
            collectTextNodes(texts, child);
    };
}

function getPositionInStrings(strs, index, toend) {
    var ix= 0;
    for (var i= 0; i<strs.length; i++) {
        var n= index-ix, l= strs[i].length;
        if (toend? l>=n : l>n)
            return [i, n];
        ix+= l;
    }
    return [i, 0];
}


// We've used a few ECMAScript Fifth Edition Array features.
// Make them work in browsers that don't support them natively.
//
if (!('indexOf' in Array.prototype)) {
    Array.prototype.indexOf= function(find, i /*opt*/) {
        if (i===undefined) i= 0;
        if (i<0) i+= this.length;
        if (i<0) i= 0;
        for (var n= this.length; i<n; i++)
            if (i in this && this[i]===find)
                return i;
        return -1;
    };
}
if (!('forEach' in Array.prototype)) {
    Array.prototype.forEach= function(action, that /*opt*/) {
        for (var i= 0, n= this.length; i<n; i++)
            if (i in this)
                action.call(that, this[i], i, this);
    };
}
if (!('map' in Array.prototype)) {
    Array.prototype.map= function(mapper, that /*opt*/) {
        var other= new Array(this.length);
        for (var i= 0, n= this.length; i<n; i++)
            if (i in this)
                other[i]= mapper.call(that, this[i], i, this);
        return other;
    };
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have some plain text and html. I need to create a PHP method

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply