I have a regex pattern for URL’s that I use to check for links in a body of text. The only problem is that the pattern will match this link
stackoverflow.com
And this sentence
I'm a sentence.Next Sentence.
Obviously this would make sense because my pattern doesn’t strong check .com, .co.uk, .com.au etc
I want it to match stackoverflow.com and not the latter.
As I’m no Regex expert, does anyone know of any good Regex patterns for checking for all types of URL’s in a body text, while not matching the sentences like above?
If I have to strong check the domain extension, I suppose I’ll have to settle.
Here’s my pattern, but i don’t think it help.
(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?
I would definitely suggest finding a working regex that someone else has made (which would probably include a strong check on the domain extension), but here is one possible way to just modify your existing regex.
It requires that you make the assumption that usually links will not mix case in the domain extension, for example you might see .COM or .com but probably not .Com, if you only match domain extensions that don’t mix case then you would avoid matching most sentences.
In the middle of your regex you have
[\w]{2,4}, try changing this to([A-Z]{2,4}|[a-z]{2,4})(or(?:[A-Z]{2,4}|[a-z]{2,4})if you don’t want a new captured group).