I’m using a pattern as described by John Gruber in this daringfireball article to auto link URLs in user comments.
I’m using it with PHP to match URLs, and want it to match a single TLD with no www and no trailing slash, but it doesn’t seem to be working.
Here’s the pattern (and can be seen in more detail at the article above):
$pattern = '#(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4})(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))#';
Specifically I’m looking at this particular subpattern: [a-z0-9.\-]+[.][a-z]{2,4}
This subpattern works separately, but as a part of the larger pattern, it doesn’t match google.com.
[a-z0-9.\-]+[.][a-z]{2,4}works as you expect, but the rest of the pattern requires at least 1 following character:etc.
You can try allowing the tail to be optional, but it may in turn give you false-positives rather than excluding false-negatives: