I need to parse an URL. I’m currently using urlparse.urlparse() and urlparse.urlsplit(). The problem

Question

0

Asked: May 23, 20262026-05-23T05:34:38+00:00 2026-05-23T05:34:38+00:00

I need to parse an URL. I’m currently using urlparse.urlparse() and urlparse.urlsplit(). The problem

0

I need to parse an URL. I’m currently using urlparse.urlparse() and urlparse.urlsplit().

The problem is that i can’t get the “netloc” (host) from the URL when it’s not present the scheme.
I mean, if i have the following URL:

http://www.amazon.com/Programming-Python-Mark-Lutz/dp/0596158106/ref=sr_1_1?ie=UTF8&qid=1308060974&sr=8-1

I can’t get the netloc: http://www.amazon.com

According to python docs:

Following the syntax specifications in
RFC 1808, urlparse recognizes a netloc
only if it is properly introduced by
‘//’. Otherwise the input is presumed
to be a relative URL and thus to start
with a path component.

So, it’s this way on purpose. But, i still don’t know how to get the netloc from that URL.

I think i could check if the scheme is present, and if it’s not, then add it, and then parse it. But this solution doesn’t seems really good.

Do you have a better idea?

EDIT:
Thanks for all the answers. But, i cannot do the “startswith” thing that’s proposed by Corey and others. Becouse, if i get an URL with other protocol/scheme i would mess it up. See:

If i get this URL:

ftp://something.com

With the code proposed i would add “http://” to the start and would mess it up.

The solution i found

if not urlparse.urlparse(url).scheme:
   url = "http://"+url
return urlparse.urlparse(url)

Something to note:

I do some validation first, and if no scheme is given i consider it to be http://

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T05:34:39+00:00

The documentation has this exact example, just below the text you pasted. Adding ‘//’ if it’s not there will get what you want. If you don’t know whether it’ll have the protocol and ‘//’ you can use a regex (or even just see if it already contains ‘//’) to determine whether or not you need to add it.

Your other option would be to use split(‘/’) and take the first element of the list it returns, which will ONLY work when the url has no protocol or ‘//’.

EDIT (adding for future readers): a regex for detecting the protocol would be something like re.match('(?:http|ftp|https)://', url)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to parse an URL. I’m currently using urlparse.urlparse() and urlparse.urlsplit(). The problem

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply