I’m importing data from 1 database to another. I’ve been requested to strip all HTML content, as its messy and not valid, and just keep the links
I currently use the following VB.NET function to strip all HTML tags from a string of content:
Public Shared Function StripHTML(ByVal htmlString As String) As String
Dim pattern As String = "<(.|\n)*?>"
Return Regex.Replace(htmlString, pattern, String.Empty)
End Function
I’m looking for a way of stripping all, but a (anchor) tags from the content.
For example if I have the following HTML content:
<table>
<tr>
<td>
Lorem <a href="http://google.com">Ipsum</a>
</td>
</tr>
</table>
This will simply become:
Lorem <a href="http://google.com">Ipsum</a>
How can I do this?
I suggest you use Html Agility Pack
also check this question/answers: HTML Agility Pack strip tags NOT IN whitelist