So I’ve got this URL regex:
/(?:((?:[^-/”‘:!=a-z0-9_@]|^|\:))((https?://)((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*’;:=+\$/%#[]-_,~]+))|@[a-z0-9!*’;:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*’;:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux
What it’s currently matching:
I need it to also match:
- http://www.google.com
- google.com
I tried making the protocol part of the regex optional by slapping a ? at the end “(https?:\/\/)?” but that didn’t do anything.
Ideas?
I’d look for something in the language that you are using to do this. URLs are tough to match with a regex. If you insist, I changed yours to make the
(https?://)optional. I did not check it though.I got this example from the RFC 3986 and was directed there by this comment. Although, I’d still recommend using something from whatever language you are using rather than a regex.
Since you are using PHP, did you consider using parse_url? It looks like it will return false on bad urls.