Just cast the result to an int like so: PS>…

Question

0

Editorial Team

Asked: May 12, 20262026-05-12T19:07:09+00:00 2026-05-12T19:07:09+00:00

I am parsing a HTML document with XPATH and I want to keep all

0

I am parsing a HTML document with XPATH and I want to keep all the inner html tags.

The html in question is a unordered list with many list elements.

<ul id="adPoint1"><li>Business</li><li>Contract</li></ul>

I am parsing the document using the following PHP code

$dom = new DOMDocument();
@$dom->loadHTML($output);
$this->xpath = new DOMXPath($dom);
$testDom = $this->xpath->evaluate("//ul[@id='adPoint1']");
$test = $testDom->item(0)->nodeValue;
echo htmlentities($test);

For some reason the output always has the html tags omitted from it. I assume that this is because XPATH was not intended to be used in this way, but is there anyway around this?

I would really like to continue using XPATH as I already use it for parsing other areas of the page (single a href elements) without a problem.

EDIT: I know that there is a better way to get the data by iterating through the child elements of the UL. There is a more complicated part of the page which I also want to parse (block of javascript), but I am trying to provide an easier to understand example.

The actual block of code that I want is

<script language="javascript">document.write(rot_decode('<u7>Pbagnpg Qrgnvyf</u7><qy vq="pbagnpgQrgnvyf"><qg>Cu:</qg><qq>(58) 0078 8455</qq></qy>'));</script>

It has the problem that it omits all the closing tags but keeps the opening tags. I’m guessing it’s because XPATH is trying to parse the inner elements rather than just treating it as a string.

If I try and select the script element with

$testDom = $this->xpath->evaluate("//div[@id='businessDetails']/script");
$test = $testDom->item(0)->nodeValue;
echo htmlentities($test);

my output will be, which you can see is missing all the closing tags.

document.write(rot_decode('<u7>Pbagnpg Qrgnvyf<qy vq="pbagnpgQrgnvyf"><qg>Cu:<qq>(58) 0078 8455'));

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T19:07:09+00:00

Editorial Team

2026-05-12T19:07:09+00:00Added an answer on May 12, 2026 at 7:07 pm

I decided XPATH wasn’t suited for what I wanted and am now using PHP Simple HTML DOM Parser which is much better suited to the task.

It maintains internal html formatting just fine.

foreach($this->simpleDom->find('script[language=javascript]') as $script) {
        echo htmlentities($script->innertext());
}

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions