XDocument coordinates = XDocument.Load("http://feeds.feedburner.com/TechCrunch");
System.IO.StreamWriter StreamWriter1 = new System.IO.StreamWriter(DestFile);
XNamespace nsContent = "http://purl.org/rss/1.0/modules/content/";
foreach (var item in coordinates.Descendants("item"))
{
string link = item.Element("guid").Value;
string content = item.Element(nsContent + "encoded").Value; //It gets all links, images etc
}
StreamWriter1.Close();
using this i can get guid element values as well as content:encoded values but the value of content:encoded element gets all the links, tags,
tags etc.
But i want the text only…Means i need the simple text data only and not need to get any img links, links etc.
How can i parse the <p>..</p> tag data in XML ?
Please suggest
Thanks
Well you have HTML embeded in that XML document. The safest thing to do would be to take that HTML and parse it using an HTML parser such as the HTML Agility Pack and go from there. It shouldn’t be that much different. Do note that the HTML is still encoded a bit so you’ll have to decode it first.
Unfortunately the HTML doesn’t seem to be very well-formed XML so you won’t be able to use LINQ to XML with that part.