Need a way to extract a domain name without the subdomain from a url

Question

0

Editorial Team

Asked: June 17, 20262026-06-17T17:55:31+00:00 2026-06-17T17:55:31+00:00

Need a way to extract a domain name without the subdomain from a url

0

Need a way to extract a domain name without the subdomain from a url using Python urlparse.

For example, I would like to extract "google.com" from a full url like "http://www.google.com".

The closest I can seem to come with urlparse is the netloc attribute, but that includes the subdomain, which in this example would be www.google.com.

I know that it is possible to write some custom string manipulation to turn http://www.google.com into google.com, but I want to avoid by-hand string transforms or regex in this task. (The reason for this is that I am not familiar enough with url formation rules to feel confident that I could consider every edge case required in writing a custom parsing function.)

Or, if urlparse can’t do what I need, does anyone know any other Python url-parsing libraries that would?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T17:55:32+00:00

You probably want to check out tldextract, a library designed to do this kind of thing.

It uses the Public Suffix List to try and get a decent split based on known gTLDs, but do note that this is just a brute-force list, nothing special, so it can get out of date (although hopefully it’s curated so as not to).

>>> import tldextract
>>> tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')

So in your case:

>>> extracted = tldextract.extract('http://www.google.com')
>>> "{}.{}".format(extracted.domain, extracted.suffix)
"google.com"

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Need a way to extract a domain name without the subdomain from a url

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply