I’m need a regular expression in Java that I can use to retrieve the domain.tld part from any url. So https://foo.com/bar, http://www.foo.com#bar, http://bar.foo.com will all return foo.com.
I wrote this regex, but it’s matching the whole url
Pattern.compile("[.]?.*[.x][a-z]{2,3}");
I’m not sure I’m matching the “.” character right. I tried “.” but I get an error from netbeans.
Update:
The tld is not limited to 2 or 3 characters, and http://www.foo.co.uk/bar should return foo.co.uk.
I would use the java.net.URI class to extract the host name, and then use a regex to extract the last two parts of the host uri.
Prints: