I’m trying to get the domain of a given URL. For example http://www.facebook.com/someuser/ will return facebook.com. The given URL can be on these formats:
https://www.facebook.com/someuser(www. is optional, but should be ignored)www.facebook.com/someuser(http:// is not required)facebook.com/someuserhttp://someuser.tumblr.com-> this has to returntumblr.comonly
I wrote this regex:
/(?: \.|\/{2})(?: www\.)?([^\/]*)/i
But it does not work as I expect.
I can do this in parts:
- Remove
http://andhttps://, if present on string, withstring.delete "/https?:\/\//i". - Remove
www.withstring.delete "/www\./i". - Get the domain with match and
/(\w+\.\w+)+/i
But this won’t work with subdomains.
String for testing:
https://www.facebook.com/username
http://last.fm/user/username
www.google.com
facebook.com/username
http://sub.tumblr.com/
sub.tumblr.com
I need this to work with the minimum memory and processing coast as possible.
Any ideas?
Why don’t you just use the URI class to do this?
And you’re done.
Just one thing, if there’s no “http://” or “https://” at the beginning of the url, you’ll have to add one, or the parse method is not going to give you a host (it’s going to be nil).