Taking this thread a step further, can someone tell me what the difference is

Question

0

Asked: May 13, 20262026-05-13T18:20:29+00:00 2026-05-13T18:20:29+00:00

Taking this thread a step further, can someone tell me what the difference is

0

Taking this thread a step further, can someone tell me what the difference is between these two regular expressions? They both seem to accomplish the same thing: pulling a link out of html.

Expression 1:

'/(https?://)?(www.)?([a-zA-Z0-9_%]*)\b.[a-z]{2,4}(.[a-z]{2})?((/[a-zA-Z0-9_%])+)?(.[a-z])?/'

Expression 2:

'/<a.*?href\s*=\s*["\']([^"\']+)[^>]*>.*?<\/a>/si'

Which one would be better to use? And how could I modify one of those expressions to match only links that contain certain words, and to ignore any matches that do not contain those words?

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T18:20:29+00:00

The difference is that expression 1 looks for valid and full URIs, following the specification. So you get all full urls that are somewhere inside of the code. This is not really related to getting all links, because it doesn’t match relative urls that are very often used, and it gets every url, not only the ones that are link targets.

The second looks for a tags and gets the content of the href attribute. So this one will get you every link. Except for one error* in that expression, it is quite safe to use it and it will work good enough to get you every link – it checks for enough differences that can appear, such as whitespace or other attributes.

*However there is one error in that expression, as it does not look for the closing quote of the href attribute, you should add that or you might match weird things:

/<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>.*?<\/a>/si

edit in response to the comment:

To look for word inside of the link url, use:

/<a.*?href\s*=\s*["\']([^"\'>]*word[^"\'>]*)["\'][^>]*>.*?<\/a>/si

To look for word inside of the link text, use:

/<a.*?href\s*=\s*["\']([^"\'>]+)["\'][^>]*>.*?word.*?<\/a>/si

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Taking this thread a step further, can someone tell me what the difference is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply