I have RSS feed that I want to modify on fly, all I need is the text (and linefeeds) so everything else must be removed ( All images, styles, links )
How can I do this easily with ASP.NET c#
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Regex cannot parse XML. Do not use regex to parse XML. Do not pass Go. Do not collect £200.
You need a proper XML parser. Load the RSS into an XMLDocument, then use innerText to get only text content.
Note that even when you’ve extracted the description content from RSS, it can contain active HTML. That is:
can, when parsed properly as XML then read as text give you either the literal string:
or, the markup:
The fun thing about RSS is that you don’t really know which is right. In RSS 2.0 it is explicitly HTML markup (the second case); in other versions it’s not specified. Generally you should assume that descriptions can contain entity-encoded HTML tags, and if you want to further strip those from the final text you’ll need a second parsing step.
(Unfortunately, since this is legacy HTML and not XML it’s harder to parse; a regex will be even more useless than it is for parsing XML. There isn’t a built-in HTML parser in .NET, but there are third-party libraries such as the HTML Agility Pack.)