This is a followup from another post at here.
Problem: The code below works good with the exception of strings that contain double quotes which will render strange characters
Sample string:
“Walter Isaacson http://t.co/vaLxVduA”
Rendered as:
“Walter Isaacson http://t.co/vaLxVduA���
t.co/vaLxVduA���
I believe the problem is in the regex. What could I try to make this work?
Code:
function makeLink($match) {
// Parse link.
$substr = substr($match, 0, 6);
if ($substr != 'http:/' && $substr != 'https:' && $substr != 'ftp://' && $substr != 'news:/' && $substr != 'file:/') {
$url = 'http://' . $match;
} else {
$url = $match;
}
return '<a href="' . $url . '">' . $match . '</a>';
}
function makeHyperlinks($text) {
// Find links and call the makeLink() function on them.
return preg_replace('/((www\.|http|https|ftp|news|file):\/\/[\w.-]+\.[\w\/:@=.+?,#%&~-]*[^.\'# !(?,><;\)])/e', "makeLink('$1')", $text);
}
The problem is die unicode character
”. When you add theumodifier, to treat every string as UTF-8, it works, but also catches the quote as part of the URL. You would need to exclude this quote also:But your regex looks kinda huge, I did a quick search for a URL regex and found this one, it seems to work also, and don’t need all the exclusions