I am parsing an HTML page with DOM and XPath in PHP.
I have to fetch a nested <Table...></table> from the HTML.
I have defined a query using FirePath in the browser which is pointing to
html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table
When I run the code it says DOMNodeList is fetched having length 0. My objective is to spout out the queried <Table> as a string. This is an HTML scraping script in PHP.
Below is the function. Please help me how can I extract the required <table>
$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";
getExchangeRateTable($pageUrl);
function getExchangeRateTable($url){
$htmlTable = "";
$xPathTable = nulll;
$xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";
if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}
// initialize objects
$page = tidyit($url);
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
// $elements is sppearing as DOMNodeList
$elements = $xpath->query($xPathQuery1);
// print_r($elements);
foreach($elements as $e){
$e->firstChild->nodeValue;
}
}
Remove the tbody’s from your XPath query – they are in most cases inserted by your browser, as is with the page you are trying to scrape.
This will most likely work.
However, its probaly more safe to use a different XPath. Following XPath will select the first th based on it’s textual content, then select the tr’s parent – a tbody or table: