I’m using domDocument to parse some HTML, and want to replace breaks with \n.

Question

0

Asked: May 29, 20262026-05-29T04:01:10+00:00 2026-05-29T04:01:10+00:00

I’m using domDocument to parse some HTML, and want to replace breaks with \n.

0

I’m using domDocument to parse some HTML, and want to replace breaks with \n. However, I’m having problems identifying where a break actually occurs within the document.

Given the following snippet of HTML – from a much larger file that I’m reading using $dom->loadHTMLFile($pFilename):

<p>Multiple-line paragraph<br />that has a close tag</p>

and my code:

foreach ($dom->getElementsByTagName('*') as $domElement) {
    switch (strtolower($domElement->nodeName)) {
        case 'p' :
            $str = (string) $domElement->nodeValue;
            echo 'PARAGRAPH: ',$str,PHP_EOL;
            break;
        case 'br' :
            echo 'BREAK: ',PHP_EOL;
            break;
    }
}

I get:

PARAGRAPH: Multiple-line paragraphthat has a close tag
BREAK:

How can I identify the position of that break within the paragraph, and replace it with a \n ?

Or is there a better alternative than using domDocument for parsing HTML that may or may not be well-formed?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T04:01:10+00:00

You can’t get the position of an element using getElementsByTagName. You should go through childNodes of each element and process text nodes and elements separately.

In the general case you’ll need recursion, like this:

function processElement(DOMNode $element){
    foreach($element->childNodes as $child){
        if($child instanceOf DOMText){
            echo $child->nodeValue,PHP_EOL;
        }elseif($child instanceOf DOMElement){
            switch($child->nodeName){
            case 'br':
                echo 'BREAK: ',PHP_EOL;
                break;
            case 'p':
                echo 'PARAGRAPH: ',PHP_EOL;
                processElement($child);
                echo 'END OF PARAGRAPH;',PHP_EOL;
                break;
            // etc.
            // other cases:
            default:
                processElement($child);
            }
        }
    }
}

$D = new DOMDocument;
$D->loadHTML('<p>Multiple-line paragraph<br />that has a close tag</p>');
processElement($D);

This will output:

PARAGRAPH: 
Multiple-line paragraph
BREAK:
that has a close tag
END OF PARAGRAPH;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using domDocument to parse some HTML, and want to replace breaks with \n.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply