doc.xpath(‘//img’) #this will get some results doc.xpath(‘//img[@class=productImage]’) #and this gets nothing at all doc.xpath(‘//div[@id=someID]’)

Question

0

Asked: May 24, 20262026-05-24T09:40:31+00:00 2026-05-24T09:40:31+00:00

doc.xpath(‘//img’) #this will get some results doc.xpath(‘//img[@class=productImage]’) #and this gets nothing at all doc.xpath(‘//div[@id=someID]’)

0

doc.xpath('//img') #this will get some results
doc.xpath('//img[@class="productImage"]') #and this gets nothing at all
doc.xpath('//div[@id="someID"]') # and this won't work either

I don’t know what went wrong here,I double checked the HTML source,There are plenty of img tag which contains the attribute(class=”productImage”).

It’s like the attribute selector just won’t work.

Here is the URL which the HTML source come from.

http://www.amazon.cn/s/ref=nb_sb_ss_i_0_1?__mk_zh_CN=%E4%BA%9A%E9%A9%AC%E9%80%8A%E7%BD%91%E7%AB%99&url=search-alias%3Daps&field-keywords=%E4%B8%93%E5%85%AB&x=0&y=0&sprefix=%E4%B8%93

please do me a favor if you got some spare time.Parse the HTML content like I do see if you can solve this one

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T09:40:31+00:00

The weird thing is if you use open-uri on that page you get a different result than when using something like curl or wget.

However when you change the User-Agent you actually get probably the page you are looking for:

Analysis:

require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'pp'  

URL = 'http://www.amazon.cn/...'

def analyze_html(file)
   doc = Nokogiri.HTML(file)
   pp   doc.xpath('//img').map { |i| i[:class] }.compact.reject(&:empty?)
   puts doc.xpath('//div').map { |i| i[:class] }.grep(/productImage/).count
   puts doc.xpath('//div[@class="productImage"]//img').count
   pp   doc.xpath('//div[@class="productImage"]//img').map { |i| i[:src] }
end

puts "Attempt 1:"
analyze_html(open(URL))

puts "Attempt 2:"
analyze_html(open(URL, "User-Agent" => "Wget/1.10.2"))

Output:

Attempt 1:
["default navSprite"]
0
0
[]
Attempt 2:
["default navSprite", "srSprite spr_kindOfSortBtn"]
16
16
["http://ec4.images-amazon.com/images/I/51fOb3ujSjL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/513UQ1xiaSL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/41zKxWXb8HL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51bj6XXAouL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/516GBhDTGCL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51ADd3HSE6L._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51CbB-7kotL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51%2Bw40Mk51L._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/519Gny1LckL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51Dv6DUF-WL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51uuy8yHeoL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51T0KEjznqL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/419WTi%2BdjzL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51QTg4ZmMmL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51l--Pxw9TL._AA115_.jpg",
 "http://ec4.images-amazon.com/images/I/51gehW2qUZL._AA115_.jpg"]

Solution:

Use User-Agent: Wget/1.10.2
Use xpath('//div[@class="productImage"]//img')

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

doc.xpath(‘//img’) #this will get some results doc.xpath(‘//img[@class=productImage]’) #and this gets nothing at all doc.xpath(‘//div[@id=someID]’)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply