I can be given a string in any of these formats:
-
url: e.g http://www.acme.com:456
-
string: e.g http://www.acme.com:456, http://www.acme.com 456, or http://www.acme.com
I would like to extract the host and if present a port. If the port value is not present I would like it to default to 80.
I have tried urlparse, which works fine for the url, but not for the other format. When I use urlparse on hostname:port for example, it puts the hostname in the scheme rather than netloc.
I would be happy with a solution that uses urlparse and a regex, or a single regex that could handle both formats.
I’m not that familiar with urlparse, but using regex you’d do something like:
Or, without port:
EDIT: fixed regex to also match ‘www.abc.com 123’