I need an advice for a project I am about to begin.
In few words, my application has to go to a certain soccer website, download the HTML and extract the necessary data.
This is what I have done so far:
:: 1) Go to a certain soccer website (ex. http://www.livescore.com/default.dll?page=england) and download the HTML using WebClient.
:: 2) Using SgmlReader convert the HTML to XML
:: 3) Using XmlDocument retrieve the data I am looking for. Usually this involves:
::::::: 3.1) Retrieving nodes using GetElementsByTagName() (ex. GetElementsByTagName(“tr”))
::::::: 3.2) Looping through the list of nodes returned by the GetElementsByTagName() method
Is there a better way to do what I trying to do?
I was thinking of LINQ to XML. Do you think this will improve performance?
Any suggestions or comments would be greatly appreciated!
Just use HTML Agility Pack! http://www.codeplex.com/htmlagilitypack
In that way you can query the document using XPath to get the nodes you need. You can even use Firefox’s plugin Firebug to help you build your XPath querys