I’m trying to parse a URI from user input. I’m assuming some users won’t put the scheme in their URI’s and I want to default to “http”.
The following code doesn’t work:
require 'uri'
uri_to_check = URI::parse("www.google.com")
uri_to_check.scheme = "http" unless uri_to_check.scheme
puts uri_to_check.to_s
I expect to see “http://www.google.com” but I get “http:www.google.com”. Is it even possible to do it this way?
If so, what am I missing?
Is there a better way to do this?
The leading slashes (
//) indicate that the URL is an IP-based address, and are needed to flag the hostname so URI can parse them correctly.Wikipedia has some good overviews and examples of use:
http://en.wikipedia.org/wiki/Url ,
http://en.wikipedia.org/wiki/URI_scheme ,
http://en.wikipedia.org/wiki/URL_normalization
The best information is in the spec itself: http://www.ietf.org/rfc/rfc1738.txt particularly in section 3.1 “3.1. Common Internet Scheme Syntax”.
You might want to consider using the Addressable gem. It’s smarter and is what I use when I need to do a lot of URI parsing or manipulation.
http://addressable.rubyforge.org/ and
http://addressable.rubyforge.org/api/Addressable/URI.html