I have made a simple function to validate URL’s submitted through a textarea (1 link per line):
function validate_urls($value)
{
//final array of links
$links = array();
$value = array_map(function($a) use (&$links){
$a = trim($a);
if(strlen($a) !== 0 and (strpos($a, 'http') !== 0 or strpos($a, 'https') !== 0)){
$a = 'http://'.$a;
}
$url = parse_url($a,PHP_URL_HOST);
if($url != null and !in_array($a, $links) and filter_var($a, FILTER_VALIDATE_URL) !== false and checkdnsrr($a)){
$links[] = $a;
}
return false;
}, explode("\n",$value));
return $links;
}
var_dump(validate_urls($_POST['links']);
What this does is check if
- the URL is valid
- the URL is active
- the URL is not a duplicate
The thing is, how come it doesn’t work (returns an empty array)? I have checked every checking and it should work but it doesn’t. Sorry if the code is messy i’m still trying to learn.
A and B or Cdoes NOT translate to(A and B) or (A and C)as AND has a higher precedence than OR. So you’d want to change that toA and (B or C).The docs on FILTER_VALIDATE_URL state »Note that the function will only find ASCII URLs to be valid;«. So this is a pretty restrictive option. It adheres to the specification of URLs given in RFC 2396 which has been superseded by RFC 3986.
Without having looked into this filter more thoroughly, these two pieces of information are sufficient (to me) to mark that filter as utterly useless.
is testing the whole URL rather than just the host. Even if you’d be checking the host, you’d be looking for an MX record (i.e. if said host is accessible by mail).
Awould be checking if that host has an IP set,CNAMEwould be checking if the host is an alias of another DNS record, …. You’re probably looking forNSwhich would check if that host has got any DNS record at all.So if you changed your check to
checkdnsrr($url, "NS")you would be validating if the host component of that URL is actually known to DNS. You are NOT checking if that host is actually listening on the specified port. And you’re NOT checking if the given resource (e.g. /foo/bar.html) exists.If you wanted to make sure an URL actually points to something useful, you’d have to make a
HEADrequest and check the response. You can do that easily with curl. If curl is not available, you could implement a simple HTTP client yourself using fsockopen() – with the disadvantage of not being able to speak HTTP (HTTP through SSL) and having to implement redirection following and similar stuff yourself. Short: you don’t want to go down that road.That said, there is also a performance problem up ahead. The HTTP requests are done synchronously. Should a host be failing to reply in an acceptable time frame, your script might time out – or at least take ages to finish, depending on the number of URLs you’re checking and the quality of service behind them.