I want to crawl web page with python, the problem is with relative paths,

Question

0

Asked: June 3, 20262026-06-03T23:55:06+00:00 2026-06-03T23:55:06+00:00

I want to crawl web page with python, the problem is with relative paths,

0

I want to crawl web page with python, the problem is with relative paths, I have the following functions which normalize and derelativize urls in web page, I can not implement one part of derelativating function. Any ideas? :

def normalizeURL(url):
    if url.startswith('http')==False:
        url = "http://"+url
    if url.startswith('http://www.')==False:
        url = url[:7]+"www."+url[7:]
    return url

def deRelativizePath(url, path):
    url = normalizeURL(url)

    if path.startswith('http'):
        return path
    if path.startswith('/')==False:
        if url.endswith('/'):
            return url+path
        else:
            return url+"/"+path
    else:
        #this part is missing

The problem is: I do not know how to get main url, they can be in many formats:

http://www.example.com
http://www.example.com/
http://www.sub.example.com
http://www.sub.example.com/
http://www.example.com/folder1/file1 #from this I should extract http://www.example.com/ then add path
...

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T23:55:08+00:00

Editorial Team

2026-06-03T23:55:08+00:00Added an answer on June 3, 2026 at 11:55 pm

I recommend that you consider using urlparse.urljoin() for this:

Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to crawl web page with python, the problem is with relative paths,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply