a string like: ‘www.test.com’ is good.
a string like: ‘www.888.com’ is good.
a string like: ‘stackoverflow.com’ is good.
a string like: ‘GOoGle.Com’ is good.
why ? because those are valid urls. it does not necessarely matter if they have been registered or not.
now bad strings are:
‘goog*d\x’
‘manydots…com’
why because you can’t register those urls.
if I have a string in java which is supposed to be a good url
what’s the best way to validate it ?
thanks a lot
Those examples are hostnames. They’re not valid URLs in themselves.
Hostnames are made of
.-separated ‘labels’. Each label must be up to 63 characters of letters, digits and hyphens, but a hyphen must not be the first or last character. It is optional to follow the whole hostname with another dot.You can match this with a pattern like (assuming case-insensitive):
However this matches strings like
1.2.3.4as well, which although they technically could be host/domain names will actually act as direct IP addresses. You may want to allow that. If you do, you may also want to allow IPv6 addresses, which are colon-separated hex; when embedded in a URL, they also have square brackets around them.And then of course there’s IDNA. Nowadays,
例え.テストis a valid IDNA domain name, corresponding toxn--r8jz45g.xn--zckzah. If you want to allow those you’ll need some Unicode support.Summary: it’s quite a bit more difficult than you might think. And that’s just hostnames. ‘Validating’ a whole URL is even more work. A simple regex isn’t going to hack it. Use a pre-existing library.