Using ASP.NET, how can I strip the HTML tags from a given string reliably (i.e. not using regex)? I am looking for something like PHP’s strip_tags.
Example:
<ul><li>Hello</li></ul>
Output:
"Hello"
I am trying not to reinvent the wheel, but I have not found anything that meets my needs so far.
If it is just stripping all HTML tags from a string, this works
reliablywith regex as well. Replace:with the empty string, globally. Don’t forget to normalize the string afterwards, replacing:
with a single space, and trimming the result. Optionally replace any HTML character entities back to the actual characters.
Note:
>in attribute values. This solution will return broken markup when encountering such values.Use a proper parser if you must get it right under all circumstances.