I am trying to get the ASIN number from amazon html page using nokogiri but I am having no luck using xpath. I have tried it with firepath and I am still getting nothing. Would it be better to just get the URL and then run a ruby REGEX to get the ASIN out? If so how would the regex look like?
#!/usr/bin/env ruby -w
require 'nokogiri'
require 'open-uri'
url = "http://www.amazon.com/gp/new-releases/books/3839/ref=zg_bsnr_nav"
doc = Nokogiri::HTML(open(url))
puts doc.xpath('//zg_list').each do | node|
p node['asin']
end
This is what I have when it prints out the url.
#!/usr/bin/env ruby -w
require 'nokogiri'
require 'open-uri'
url = "http://www.amazon.com/gp/new-releases/books/3839/ref=zg_bsnr_nav"
doc = Nokogiri::HTML(open(url))
l = doc.css('div.zg_image a').map { |link|
link['href']
}
puts l # => /Introducing-ZBrush-4-Eric-Keller/dp/0470527641/ref=zg_bsnr_3839_20/183-0702383-0095048
For me the
cssmethod in Nokogiri is much easier to work with than XPath. Given the HTML at the URL you posted, the following should retrieve the “asin” property for each item:I think the correct XPath would be something like: