I am working on writing a Perl-compatible regular expression in PHP to check whether a given string is a valid URL.
Now currently it works as expected, but I am wondering if there are any precautions I should use to check whether it’s safe for user input. The $url variable is being submitted as-is, as in plain-text.
Here is the whole function:
private function real_url($url) {
return preg_match("/(http|https):\/\/(.*?)\.[a-zA-Z]{2,6}/i",$url);
}
I only want it to check for http and https. I’m not worried about ftp, irc and the like. Just web links.
It also checks how long the TLD is. So "google.asdfasdfasdf" will return false but google.asdf" will return true. How can I fix that? ".asdf" clearly isn’t a valid TLD.
I just need to know two things:
- How to check whether the given URL is actually legitimate;
- Whether it is safe for raw user input.
You should use
filter_varinstead:Note, this won’t validate that the scheme is allowed (such as http/https), nor that the top-level domain exists.
The simplest way of verifying the domain as actually being valid would be to do a DNS lookup, for instance using
checkdnsrr: