I need to strip this string <a class=BC_ANCHOR href="http://www.msn.com" onClick=something target=_blank>MSN</a> into <a href="http://www.msn.com">MSN</a> – however this Regex \s+\w+[^href]=\S*\w? won’t stop at the closing > but rather runs to the end of the </a> – can someone please assist me in getting this Regex to stop at that closing >?
Thanks!
By putting
\w+[^href]you still allow things like<a href ="...and can exclude tags ending inh,r,e, orf(that aren’t necessarilyhref).Try
Explanation: The
(?!href)is a negative lookahead and prevents the tag from beinghref.The
[a-zA-Z]+is your tag. There are spaces allowed before and after the ‘=’. I restricted to letters, because I’m pretty sure attribute names can’t include numbers or underscores (which\wwill allow).The
(?:"[^"]+"|\w+)means that the value of the tag can be anything within double-quotes, OR a non-quoted set of\w+.These all prevent the match from going outside the
>, unless your regex is malformed and you have (e.g.)<a name="asdf>(note the missing closing").