There is a lot of argument back and forth over when and if it

Question

0

Asked: May 20, 20262026-05-20T14:44:42+00:00 2026-05-20T14:44:42+00:00

There is a lot of argument back and forth over when and if it

0

There is a lot of argument back and forth over when and if it is ever appropriate to use a regex to parse html.

As a common problem that comes up is parsing links from html my question is, would using a regex be appropriate if all you were looking for was the href value of <a> tags in a block of HTML? In this scenario you are not concerned about closing tags and you have a pretty specific structure you are looking for.

It seems like significant overkill to use a full html parser. While I have seen questions and answers indicating the using a regex to parse URLs, while largely safe is not perfect, the extra limitations of structured <a> tags would appear to provide a context where one should be able to achieve 100% accuracy without breaking a sweat.

Thoughts?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T14:44:43+00:00

Editorial Team

2026-05-20T14:44:43+00:00Added an answer on May 20, 2026 at 2:44 pm

Consider this valid html:

<!DOCTYPE html>
<title>Test Case</title>
<p>
<!-- <a href="url1"> -->
<span class="><a href='url2'>"></span>
<a href='my">url<'>click</a>
</p>

What is the list of urls to be extracted? A parser would say just a single url with value my">url<. Would your regular expression?

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

There is a lot of argument back and forth over when and if it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply