I have a link such as http://www.techcrunch.com/ and I would like to get just

Question

0

Asked: May 12, 20262026-05-12T16:28:51+00:00 2026-05-12T16:28:51+00:00

I have a link such as http://www.techcrunch.com/ and I would like to get just

0

I have a link such as http://www.techcrunch.com/ and I would like to get just the techcrunch.com part of the link. How do I go about this in python?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T16:28:51+00:00

Getting the hostname is easy enough using urlparse:

hostname = urlparse.urlparse("http://www.techcrunch.com/").hostname

Getting the “root domain”, however, is going to be more problematic, because it isn’t defined in a syntactic sense. What’s the root domain of “www.theregister.co.uk”? How about networks using default domains? “devbox12” could be a valid hostname.

One way to handle this would be to use the Public Suffix List, which attempts to catalogue both real top level domains (e.g. “.com”, “.net”, “.org”) as well as private domains which are used like TLDs (e.g. “.co.uk” or even “.github.io”). You can access the PSL from Python using the publicsuffix2 library:

import publicsuffix
import urlparse

def get_base_domain(url):
    # This causes an HTTP request; if your script is running more than,
    # say, once a day, you'd want to cache it yourself.  Make sure you
    # update frequently, though!
    psl = publicsuffix.fetch()

    hostname = urlparse.urlparse(url).hostname

    return publicsuffix.get_public_suffix(hostname, psl)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a link such as http://www.techcrunch.com/ and I would like to get just

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply