I’m using Mechanize for scraping images url then I’m looking http://mechanize.rubyforge.org/Mechanize/Page/Image.html for to know width and height images.
I write in console:
url = "http://www.bbc.co.uk/"
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact
I get the result:
["1", "84", "432", "432", "432", "432", "432", "432", "432", "304", "144", "144", "144", "144", "144", "144", "432", "432", "432", "432", "432", "432", "432", "336", "62", "62", "62", "62", "84", "1", "0"]
This result works fine for me I get image’s width.
However with others web pages I get nil e.g. you can check with this web page:
url = "http://www.glamourum.com" #check also with https://www.birchbox.com/
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact
I get a result:
=> []
an array empty :O or for https://www.birchbox.com/ I get an array with:
=> ["1", "1", "1", "1", "1"]
why this happens with some websites and does not occur with other websites?
What is the solution for this problem?
Mechanize doesn’t fetch the images. It can only return you the size as reflected on the
imgtag in the HTML and a lot sites don’t include that.