I have a pretty simple reg ex question. My HTML tag looks like the following:
<body lang=EN-US link=blue vlink=purple>
I want to clear all attributes and just return <body>
There are a number of other HTML tags whose attributes I’d like to clear so I hope to reuse the solution. How to do this with a regular expression?
Thanks,
B.
Use HtmlAgilityPack like this:
Call this method passing the html that you want to remove all attributes from.
xpath will help you a lot with this.
Don’t use a regex for html files that may contain scripts, as in Javascript, the characters
<and>are not tag delimiters but operators. A Regexp will probably match these operators as if they were tags, which will completely mess up the document.