Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 782143
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T20:20:25+00:00 2026-05-14T20:20:25+00:00

I have some plain text and html. I need to create a PHP method

  • 0

I have some plain text and html. I need to create a PHP method that will return the same html, but with <span class="marked"> before any instances of the text and </span> after it.

Note, that it should support tags in the html (for example if the text is blabla so it should mark when it’s bla<b>bla</b> or <a href="http://abc.com">bla</a>bla.

It should be incase sensitive and support long text (with multilines etc) either.

For example, if I call this function with the text “my name is josh” and the following html:

<html>
<head>
    <title>My Name Is Josh!!!</title>
</head>
<body>
    <h1>my name is <b>josh</b></h1>
    <div>
        <a href="http://www.names.com">my name</a> is josh
    </div>

    <u>my</u> <i>name</i> <b>is</b> <span style="font-family: Tahoma;">Josh</span>.
</body>
</html>

… it should return:

<html>
<head>
    <title><span class="marked">My Name Is Josh</span>!!!</title>
</head>
<body>
    <h1><span class="marked">my name is <b>josh</b></span></h1>
    <div>
        <span class="marked"><a href="http://www.names.com">my name</a> is josh</span>
    </div>

    <span class="marked"><u>my</u> <i>name</i> <b>is</b> <span style="font-family: Tahoma;">Josh</span></span>.
</body>
</html>

Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T20:20:26+00:00Added an answer on May 14, 2026 at 8:20 pm

    This is going to be tricky.

    Whilst you could do it with simple regex hacking, ignoring anything inside a tag, something like the naïve:

    preg_replace(
        'My(<[^>]>)*\s+(<[^>]>)*name(<[^>]>)*\s+(<[^>]>)*is(<[^>]>)*\s+(<[^>]>)*Josh',
        '<span class="marked">$0</span>', $html
    )
    

    that’s not at all reliable. Partly because HTML can’t be parsed with regex: it’s valid to put > in an attribute value, and other non-element constructs like comments will be mis-parsed. Even with a more rigorous expression to match tags — something horribly unwieldy like <[^>\s]*(\s+([^>\s]+(\s*=\s*([^"'\s>][\s>]*|"[^"]*"|'[^']*')\s*))?)*\s*\/?>, you’d still have many of the same problems, especially if the input HTML is not guaranteed valid.

    This could even be a security issue, as if the HTML you are processing is untrusted, it could fool your parser into turning text content into attributes, resulting in script injection.

    But even ignoring that, you wouldn’t be able to ensure proper element nesting. So you might turn:

    <em>My name is <strong>Josh</strong>!!!</em>
    

    into the misnested and invalid:

    <span class="marked"><em>My name is <strong>Josh</strong></span>!!!</em>
    

    or:

    My
    <table><tr><td>name is</td></tr></table>
    Josh
    

    where those elements can’t be wrapped with a span. If you’re unlucky, the browser fixups to ‘correct’ your invalid output could end up leaving half the page ‘marked’, or messing up the page layout.

    So you would have to do this on a parsed-DOM level rather than with string hacking. You could parse the whole string in using PHP, process it and re-serialise, but if it’s acceptable from an accessibility point of view, it would probably be easier to do it at the browser end in JavaScript, where the content is already parsed into DOM nodes.

    It’s still going to be pretty hard. This question handles it where the text will all be inside the same text node, but that’s a much simpler case.

    What you would effectively have to do would be:

    for each Element that may contain a <span>:
        for each child node in the element:
           generate the text content of this node and all following siblings
           match the target string/regex against the whole text
           if there is no match:
               break the outer loop - on to the next element.
           if the current node is an element node and the index of the match is not 0:
               break the inner loop - on to the next sibling node
           if the current node is a text node and the index of the match is > the length of the Text node data:
               break the inner loop - on to the next sibling node
           // now we have to find the position of the end of the match
           n is the length of the match string
           iterate through the remaining text node data and sibling text content:
               compare the length of the text content with n
               less?:
                   subtract length from n and continue
               same?:
                   we've got a match on a node boundary
                   split the first text node if necessary
                   insert a new span into the document
                   move all the nodes from the first text node to this boundary inside the span
                   break to outer loop, next element
               greater?:
                   we've got a match ending inside the node.
                   is the node a text node?:
                       then we can split the text node
                       also split the first text node if necessary
                       insert a new span into the document
                       move all contained nodes inside the span
                       break to outer loop, next element
                   no, an element?:
                       oh dear! We can't insert a span here
    

    Ouch.

    Here’s an alternative suggestion which is slightly less nasty, if it’s acceptable to wrap every text node that is part of a match separately. So:

    <p>Oh, my</p> name <div><div>is</div><div> Josh
    

    would leave you with the output:

    <p>Oh, <span class="marked">my</span></p>
    <span class="marked"> name </span>
    <div><div><span class="marked">is</span></div></div>
    <span class="marked"> Josh</span>
    

    which might look OK, depending on how you’re styling the matches. It would also solve the misnesting problem of matches partially inside elements.

    ETA: Oh sod the pseudocode, I’ve more-or-less written the code now anyway, might as well finish it. Here’s a JavaScript version of the latter approach:

    markTextInElement(document.body, /My\s+name\s+is\s+Josh/gi);
    
    
    function markTextInElement(element, regexp) {
        var nodes= [];
        collectTextNodes(nodes, element);
        var datas= nodes.map(function(node) { return node.data; });
        var text= datas.join('');
    
        // Get list of [startnodei, startindex, endnodei, endindex] matches
        //
        var matches= [], match;
        while (match= regexp.exec(text)) {
            var p0= getPositionInStrings(datas, match.index, false);
            var p1= getPositionInStrings(datas, match.index+match[0].length, true);
            matches.push([p0[0], p0[1], p1[0], p1[1]]);
        }
    
        // Get list of nodes for each match, splitted at the edges of the
        // text. Reverse-iterate to avoid the splitting changing nodes we
        // have yet to process.
        //
        for (var i= matches.length; i-->0;) {
            var ni0= matches[i][0], ix0= matches[i][1], ni1= matches[i][2], ix1= matches[i][3];
            var mnodes= nodes.slice(ni0, ni1+1);
            if (ix1<nodes[ni1].length)
                nodes[ni1].splitText(ix1);
            if (ix0>0)
                mnodes[0]= nodes[ni0].splitText(ix0);
    
            // Replace each text node in the sublist with a wrapped version
            //
            mnodes.forEach(function(node) {
                var span= document.createElement('span');
                span.className= 'marked';
                node.parentNode.replaceChild(span, node);
                span.appendChild(node);
            });
        }
    }
    
    function collectTextNodes(texts, element) {
        var textok= [
            'applet', 'col', 'colgroup', 'dl', 'iframe', 'map', 'object', 'ol',
            'optgroup', 'option', 'script', 'select', 'style', 'table',
            'tbody', 'textarea', 'tfoot', 'thead', 'tr', 'ul'
        ].indexOf(element.tagName.toLowerCase()===-1)
        for (var i= 0; i<element.childNodes.length; i++) {
            var child= element.childNodes[i];
            if (child.nodeType===3 && textok)
                texts.push(child);
            if (child.nodeType===1)
                collectTextNodes(texts, child);
        };
    }
    
    function getPositionInStrings(strs, index, toend) {
        var ix= 0;
        for (var i= 0; i<strs.length; i++) {
            var n= index-ix, l= strs[i].length;
            if (toend? l>=n : l>n)
                return [i, n];
            ix+= l;
        }
        return [i, 0];
    }
    
    
    // We've used a few ECMAScript Fifth Edition Array features.
    // Make them work in browsers that don't support them natively.
    //
    if (!('indexOf' in Array.prototype)) {
        Array.prototype.indexOf= function(find, i /*opt*/) {
            if (i===undefined) i= 0;
            if (i<0) i+= this.length;
            if (i<0) i= 0;
            for (var n= this.length; i<n; i++)
                if (i in this && this[i]===find)
                    return i;
            return -1;
        };
    }
    if (!('forEach' in Array.prototype)) {
        Array.prototype.forEach= function(action, that /*opt*/) {
            for (var i= 0, n= this.length; i<n; i++)
                if (i in this)
                    action.call(that, this[i], i, this);
        };
    }
    if (!('map' in Array.prototype)) {
        Array.prototype.map= function(mapper, that /*opt*/) {
            var other= new Array(this.length);
            for (var i= 0, n= this.length; i<n; i++)
                if (i in this)
                    other[i]= mapper.call(that, this[i], i, this);
            return other;
        };
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have an XSLT file generating plain HTML. I need to wrap some elements
I have a problem... I have some long hyperlinks that are plain text that
I have some text lines like that : vt_wildshade2^508^508 vt_ailleurs2^1188^1188 ... vt_high2^13652^13652 Is it
I have some plain text file (.xls extension) with next markup (attached below). It's
I have a lot of text that I need to process for valid URLs.
I have a plain ole' HTML document. It gets read into some Java code
All I need would be just the error message in plain text. But ASP.NET
On input I have a plain text (in my case typically it will be
In an iPhone application, I have some plain C functions. Is it possible to
I have some data loaded as a np.ndarray and need to convert it to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.