I need to uniquely identify and store some URLs. The problem is that sometimes they come containing “..” like http://somedomain.com/foo/bar/../../some/url which basically is http://somedomain.com/some/url if I’m not wrong.
Is there a Python function or a tricky way to resolve this URLs ?
There’s a simple solution using
urllib.parse.urljoin:However, if there is no trailing slash (the last component is a file, not a directory), the last component will be removed.
This fix uses the urlparse function to extract the path, then use (the posixpath version of)
os.pathto normalize the components. Compensate for a mysterious issue with trailing slashes, then join the URL back together. The following isdoctestable: