I want to validate a URL, so I searched and found this
Brian Ray said in his post that
“@Tate’s answer is good for a full URL, but if you want to validate a domain column, you don’t want to allow the extra URL bits his regex allows (e.g. you definitely don’t want to allow a URL with a path to a file).
So I removed the protocol, port, file path, and query string parts of the regex, resulting in this:”
I don’t understand what he said at all. How can a URL be a path to a file? What is a “domain column”?
A URL consists of several parts. If you have a very eleborate URL, like:
The parts are:
The only parts that may not be omitted are the protocol (but many programs allow defaulting to http://) and host name. Each part has its own requirements for what are legal characters in it. And what’s worse, not all web servers agree on what those requirements are. So the only thing you can check without making an actual connection and seeing if it fails, is the part which is needed to contact the web server. This is only the protocol, host and domain name, and port. These are all case insensitive (the rest may not be). I’m not sure what are valid characters in a host or domain name, but this is also something where name servers may not agree with the specification.
In short, the only way to check if an URL is valid is to try to make a connection to it. If your program uses some magic to reject URLs (or email addresses), some people are going to hate you and/or their internet provider for it (because even if your check follows the specification, some host or domain names don’t).
As to your question how an URL can refer to a local file, there is a special protocol for that:
file://. Since the path must start with a / as well, this results in URLs likefile:///home/user/file.html, so with three slashes at the start.