I’m attempting to scrape the value of an input box from a URL. I

Question

0

Asked: May 30, 20262026-05-30T00:31:14+00:00 2026-05-30T00:31:14+00:00

I’m attempting to scrape the value of an input box from a URL. I

0

I’m attempting to scrape the value of an input box from a URL. I seem to be having problems with my implementation of XPath.

The page to be scraped looks something like:

<!DOCTYPE html> 
<html lang="en">
    <head></head>
    <body>
        <div><span>Blah</span></div>
        <div><span>Blah</span> Blah</div>
        <div>
            <form method="POST" action="blah">
                <input name="SomeName" id="SomeId" value="GET ME"/>
                <input type="hidden" name="csrfToken" value="ajax:3575644127378754050" id="csrfToken-login">
            </form>
        </div>
    </body>
</html>

and I’m attempting to parse it like this:

$Contents = file_get_contents("https://www.linkedin.com/uas/login");
$Selector = "//input[@id='csrfToken-login']/@value";
print_r($Selector);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHtml($Contents);
$xpath = new DOMXPath($dom);
libxml_use_internal_errors(false);
print_r($xpath->query($Selector));

NB: dump() just wraps print_r() but adds some stack trace info and formatting.

The output is as folllowws:

14:50:08 scraper.php 181: (Scraper->Test)
//input[@id='csrfToken-login']/@value

14:50:08 scraper.php 188: (Scraper->Test)
DOMNodeList Object
(
)

Which I’m assuming means it was unable to find anything in the document which matches my selector? I’ve tried a number of variations, jsut to see if I can get something back:

/input/@value
/input
//input
/div

The only selector which I’ve been able to get anything from is / which returns the entire document.

What am I doing wrong?

EDIT: As some can’t reproduce the problem with the old example, I’ve replaced it with an almost identical example which also demonstrates the problem but uses a public URL (LinkedIn login page).

There’s been a suggestion that this isn’t possible due to the parser choking on html5 – (as is the internal page) anyone have any experience of this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T00:31:16+00:00

If your selector starts with a single slash(/), it means the absolute path from the root. You need to use double slash (//) which selects all matching elements regardless of their location.

print_r won’t work for this. Everything was fine in your code except for actually getting value.
Lists classes in PHP usually have a property called length, check that instead.

$Contents = file_get_contents("https://www.linkedin.com/uas/login");
$Selector = "//input[@id='csrfToken-login']/@value";
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHtml($Contents);
$xpath = new DOMXPath($dom);
libxml_use_internal_errors(false);
$b = $xpath->query($Selector);
echo $b->item(0)->value;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m attempting to scrape the value of an input box from a URL. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply