Say I have an html file that I have loaded, I run this query:
$url = 'http://www.fangraphs.com/players.aspx';
$html = file_get_contents($url);
$myDom = new DOMDocument;
$myDom->formatOutput = true;
@$myDom->loadHTML($html);
$anchor = $xpath->query('//a[contains(@href,"letter")]');
That gives me a list of these anchors that look like the following:
<a href="players.aspx?letter=Aa">Aa</a>
But I need a way to only get “players.aspx?letter=Aa”.
I thought I could try:
$anchor = $xpath->query('//a[contains(@href,"letter")]/@href');
But that gives me a php error saying I couldn’t append node when I try the following:
$xpath = new DOMXPath($myDom);
$newDom = new DOMDocument;
$j = 0;
while( $myAnchor = $anchor->item($j++) ){
$node = $newDom->importNode( $myAnchor, true ); // import node
$newDom->appendChild($node);
}
Any idea how to obtain just the value of the href tags that the first query selects?? Thanks!
Your XPath query is returning attributes themselves (i.e.,
DOMAttrobjects) rather than elements (i.e.,DOMElementobjects). That’s fine, and that seems to be what you want, but appending them to the document is the problem. ADOMAttris not a standalone node in the document tree; it’s associated with aDOMElementbut is not a child in the usual sense. Thus, directly appending aDOMAttrto the document is invalid.From the W3C specs:
Either associate the
DOMAttrwith aDOMElementand append that element, or pull out theDOMAttr‘s value and use that as you wish.To just append its plain text value, use its value in a
DOMTextnode and append that. For example, change this line:to this: