Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6137709
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T17:45:38+00:00 2026-05-23T17:45:38+00:00

Regex was my original idea as a solution, although it soon became apparent a

  • 0

Regex was my original idea as a solution, although it soon became apparent a DOM parser would be more appropriate… I’d like to convert spaces to   between PRE tags within a string of HTML text. For example:

<table atrr="zxzx"><tr>
<td>adfa a   adfadfaf></td><td><br /> dfa  dfa</td>
</tr></table>
<pre class="abc" id="abc">
abc 123
<span class="abc">abc 123</span>
</pre>
<pre>123 123</pre>

into (note the space in the span tag attribute is preserved):

<table atrr="zxzx"><tr>
<td>adfa a   adfadfaf></td><td><br /> dfa  dfa</td>
</tr></table>
<pre class="abc" id="abc">
abc&nbsp;123
<span class="abc">abc&nbsp;123</span>
</pre>
<pre>123 123</pre>

The result needs to be serialised back into string format, for use elsewhere.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T17:45:39+00:00Added an answer on May 23, 2026 at 5:45 pm

    This is somewhat tricky when you want to insert &nbsp; Entities without DOM converting the ampersand to &amp; entities because Entities are nodes and spaces are just character data. Here is how to do it:

    $dom = new DOMDocument;
    $dom->loadHtml($html);
    $xp = new DOMXPath($dom);
    foreach ($xp->query('//text()[ancestor::pre]') as $textNode)
    {
        $remaining = $textNode;
        while (($nextSpace = strpos($remaining->wholeText, ' ')) !== FALSE) {
            $remaining = $remaining->splitText($nextSpace);
            $remaining->nodeValue = substr($remaining->nodeValue, 1);
            $remaining->parentNode->insertBefore(
                $dom->createEntityReference('nbsp'),
                $remaining
            );
        }
    }
    

    Fetching all the pre elements and working with their nodeValues doesnt work here because the nodeValue attribute would contain the combined DOMText values of all the children, e.g. it would include the nodeValue of the span childs. Setting the nodeValue on the pre element would delete those.

    So instead of fetching the pre nodes, we fetch all the DOMText nodes that have a pre element parent somewhere up on their axis:

    DOMElement pre
        DOMText "abc 123"         <-- picking this
        DOMElement span
           DOMText "abc 123"      <-- and this one
    DOMElement
        DOMText "123 123"         <-- and this one
    

    We then go through each of those DOMText nodes and split them into separate DOMText nodes at each space. We remove the space and insert a nbsp Entity node before the split node, so in the end you get a tree like

    DOMElement pre
        DOMText "abc"
        DOMEntity nbsp
        DOMText "123"
        DOMElement span
           DOMText "abc"
           DOMEntity nbsp
           DOMText "123"
    DOMElement
        DOMText "123"
        DOMEntity nbsp
        DOMText "123"
    

    Because we only worked with the DOMText nodes, any DOMElements are left untouched and so it will preserve the span elements inside the pre element.

    Caveat:

    Your snippet is not valid because it doesnt have a root element. When using loadHTML, libxml will add any missing structure to the DOM, which means you will get your snippet including a DOCTYPE, html and body tag back.

    If you want the original snippet back, you’d have to getElementsByTagName the body node and fetch all the children to get the innerHTML. Unfortunately, there is no innerHTML function or property in PHP’s DOM implementation, so we have to do that manually:

    $innerHtml = '';
    foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $child) {
        $tmp_doc = new DOMDocument();
        $tmp_doc->appendChild($tmp_doc->importNode($child,true));
        $innerHtml .= $tmp_doc->saveHTML();
    }
    echo $innerHtml;
    

    Also see

    • How to get innerHTML of DOMNode?
    • DOMDocument in php
    • https://stackoverflow.com/search?q=user%3A208809+dom
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What regex pattern would need I to pass to java.lang.String.split() to split a String
My regex pattern looks something like <xxxx location=file path/level1/level2 xxxx some=xxx> I am only
I my application I am using below regex for pattern matching. Original Pattern :
EDIT: A working regex (take the second group): (^|[ ,\t\n]+)([0-9\\.]+)($|[ ,\t\n]+) Original post: I'm
I'm using C# and .NET and I have a Regex that looks like this
I had a regex, like so: (?<one-1>cat)|(?<two-2>dog)|(?<three-3>mouse)|(?<four-4>fish) When I tried to use this pattern
Let's say the original text is something like this: 12345 {unit} Then what's the
Regex.IsMatch( foo, [\U00010000-\U0010FFFF] ) Throws: System.ArgumentException: parsing [-] - [x-y] range in reverse order.
This regex: ^((https?|ftp)\:(\/\/)|(file\:\/{2,3}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)?)+?)(\.)([a-z]{2} |com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum))([a-zA-Z0-9\?\=\&\%\/]*)?$ Formatted for readability: ^( # Begin regex / begin
Simple regex question. I have a string on the following format: this is a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.