I am using a PHP function to automatically turn URLs in a text string into an actual link that people can click on. It seems to work in most cases, however I have found some cases where it does not.
I don’t really understand regular expressions at all, so I was hoping someone could help me out with this.
Here is the pattern I’m currently using:
$pattern = "/(((http[s]?:\/\/)|(www\.))(([a-z][-a-z0-9]+\.)?[a-z][-a-z0-9]+\.[a-z]+(\.[a-z]{2,2})?)\/?[a-z0-9.,_\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1})/is";
However here are some links I have found that this pattern is not matching:
- http://www.oakvilletransit.ca – Not sure, but assuming it doesn’t match because of the two-letter country code
- http://www.grt.ca – Another one with the .ca domain that is not working
- Several other .ca addresses
- freepublictransports.com – Addresses without www. or http:// in front of them. I would like these to work as well.
- http://www.222tips.com – Assuming it doesn’t match because of the numbers at the beginning of the address.
Does anyone know how I can modify that regex pattern to match these cases as well?
EDIT – It should also match URLs that may have a period at the end. If a URL is the last part of a sentence there may be a period at the end that should not be included in the actual link. Currently this pattern takes that into account as well.
EDIT 2 – I am using the pattern like this:
$pattern = "/((http|https):\/\/)?([a-z0-9-]+\.)?[a-z][a-z0-9-]+(\.[a-z]{2,6}){1,3}(\/[a-z0-9.,_\/~#&=;%+?-]*)?/is";
$string = preg_replace($pattern, " <a target='_blank' href='$1'>$1</a>", $string);
// fix URLs without protocols
$string = preg_replace("/href='www/", "href='http://www", $string);
return $string;
The following regex will match URLs:
http://orhttps://www.example.com,help.example.com, etc)www.example.com.gu,www.example.com.au.museum, etc)The
/iat the end makes it case insensitive./((http|https):\/\/)?([a-z0-9-]+\.)?[a-z0-9-]+(\.[a-z]{2,6}){1,3}(\/[a-z0-9.,_\/~#&=;%+?-]*)?/isEdit: This will not match any “hanging” periods at the end (such as the end of a sentence) because it’s not part of the URL, and shouldn’t be included in the
hrefattribute of your link.Edit 2: In your first
preg_replace(), change$1to$0. This will insert the entire matched string instead of a single part of it.Edit 3: (Update 2) Here’s a better way you can check for a
http://orhttps://at the beginning: