I’m currently trying to parse a document with DOMDocument, and I’m having some serious

Question

0

Asked: May 18, 20262026-05-18T04:38:16+00:00 2026-05-18T04:38:16+00:00

I’m currently trying to parse a document with DOMDocument, and I’m having some serious

0

I’m currently trying to parse a document with DOMDocument, and I’m having some serious problems. I created a script that runs fine on php 5.2.9, ripping out content using DOMNode::nodeValue. The same script fails to get any content on php 5.3.3 – even though it correctly navigates to the proper nodes to extract content.

Basically, the code used looks like this:

$dom = new DOMDocument();
$dom->loadHTML($data);
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXpath($dom);
$nodelist = $xpath->query($query);
$value = $nodelist->item(0)->nodeValue;

I’ve checked to make sure that item(0) is in fact a node – it’s there and even of the right type, but nodeValue is empty.

The script works on some documents but not others (on 5.3.3) – on 5.2.9 it works on all documents, returning the proper nodeValue.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T04:38:17+00:00

I seem to have missed something basic and/or a bug (though if the bug is in php or libxml I don’t know). Basically, the issue is fixed by making sure the data loaded with loadHTML is UTF-8 encoded. Mind you, it’s not the entire document that needs to be UTF-8 encoded – the problem here was that there was a character in the element which wasn’t in UTF-8. That then threw off everything else in the document handling.

What gets me is that this basically meant all document content was thrown out – but the structure was in place working normally. No errors or anything to suggest the content was seen as invalid.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m currently trying to parse a document with DOMDocument, and I’m having some serious

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply