Here’s a link:
I’m using HTML Agility Pack and I would like to extract, say, the 188 from the ‘Odds’ column. My editor gives /html/body/form/div/div[2]/div/table/tr/td[2]/div/table/tr[3]/td[7] when asked for path. I tried that path with various of omissions of body or html, but neither of them return any results when passed to .DocumentNode.SelectNodes(). I also tried with the // at the beginning (which, I assume, is the root of the document tree). What gives?
EDIT:
Code:
WebClient client = new WebClient();
string html = client.DownloadString(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach(HtmlNode node in doc.DocumentNode.SelectNodes("/some/xpath/expression"))
{
Console.WriteLine("[" + node.InnerText + "]");
}
When scraping sites, you can’t rely safely on the exact XPATH given by tools as in general, they are too restrictive, and in fact catch nothing most of the time. The best way is to have a look at the HTML and determine something more resilient to changes.
Here is a piece of code that works with your example:
It outputs
188.The way it works is: