Given a URL, I want to extract domain name(It should not include ‘www’ part).

Question

0

Asked: May 31, 20262026-05-31T08:53:35+00:00 2026-05-31T08:53:35+00:00

Given a URL, I want to extract domain name(It should not include ‘www’ part).

0

Given a URL, I want to extract domain name(It should not include ‘www’ part). Url can contain http/https. Here is the java code that I wrote. Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.

public static String getDomainName(String url) throws MalformedURLException{
    if(!url.startsWith("http") && !url.startsWith("https")){
         url = "http://" + url;
    }        
    URL netUrl = new URL(url);
    String host = netUrl.getHost();
    if(host.startsWith("www")){
        host = host.substring("www".length()+1);
    }
    return host;
}

Input: http://google.com/blah

Output: google.com

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T08:53:37+00:00

If you want to parse a URL, use java.net.URI. java.net.URL has a bunch of problems — its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.

“Mr. Gosling — why did you make url equals suck?” explains one such problem. Just get in the habit of using java.net.URI instead.

public static String getDomainName(String url) throws URISyntaxException {
    URI uri = new URI(url);
    String domain = uri.getHost();
    return domain.startsWith("www.") ? domain.substring(4) : domain;
}

should do what you want.

Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.

Your code as written fails for the valid URLs:

httpfoo/bar — relative URL with a path component that starts with http.
HTTP://example.com/ — protocol is case-insensitive.
//example.com/ — protocol relative URL with a host
www/foo — a relative URL with a path component that starts with www
wwwexample.com — domain name that does not starts with www. but starts with www.

Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that’s built into the core libraries.

If you really need to deal with messy inputs that java.net.URI rejects, see RFC 3986 Appendix B:

Appendix B. Parsing a URI Reference with a Regular Expression

As the “first-match-wins” algorithm is identical to the “greedy”
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.

The following line is the regular expression for breaking-down a
well-formed URI reference into its components.
  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
   12            3  4          5       6  7        8 9
The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given a URL, I want to extract domain name(It should not include ‘www’ part).

Leave an answerCancel reply

1 Answer

Appendix B. Parsing a URI Reference with a Regular Expression

Leave an answer
Cancel reply