Possible Duplicate:
Parsing web pages
I am trying to parse the content of web-page in C#. This is the code that I use:
WebRequest request = WebRequest.Create("URL");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
but the problem is that I get all data that the html contains.
Do you have any suggestion on how to take useful data in a ‘clean’ way or I have to build my own parser? For example: A post containing a title and a text related to it, blog-like format.
If you are indeed trying to parse blog posts from a web page do not do it that way, don’t even think of using the HTML Agility Pack.
Instead you should use the SyndicationFeed and related classes that are already built into the .Net framework (since v3.5). These are tailor made for consuming and ripping apart RSS feeds.