I have this code:
var url = textBox1.Text;
WebClient wc = new WebClient();
var page= wc.DownloadString(url);
XElement doc = XElement.Parse(page);
It fails with exception about unexpected characters.
Obviously, the HTML i’m trying to parse in such a dumb way is not strict xml.
What’s the next easiest way to parse arbitrary HTML to something IQueriable?
What I actually want is to grab a table inside and paging links.
Then parse them on my own with LINQ.
Have a look at the HTML Agility Pack:
http://www.codeplex.com/htmlagilitypack