There is a lot of argument back and forth over when and if it is ever appropriate to use a regex to parse html.
As a common problem that comes up is parsing links from html my question is, would using a regex be appropriate if all you were looking for was the href value of <a> tags in a block of HTML? In this scenario you are not concerned about closing tags and you have a pretty specific structure you are looking for.
It seems like significant overkill to use a full html parser. While I have seen questions and answers indicating the using a regex to parse URLs, while largely safe is not perfect, the extra limitations of structured <a> tags would appear to provide a context where one should be able to achieve 100% accuracy without breaking a sweat.
Thoughts?
Consider this valid html:
What is the list of urls to be extracted? A parser would say just a single url with value
my">url<. Would your regular expression?