doc.xpath('//img') #this will get some results
doc.xpath('//img[@class="productImage"]') #and this gets nothing at all
doc.xpath('//div[@id="someID"]') # and this won't work either
I don’t know what went wrong here,I double checked the HTML source,There are plenty of img tag which contains the attribute(class=”productImage”).
It’s like the attribute selector just won’t work.
Here is the URL which the HTML source come from.
http://www.amazon.cn/s/ref=nb_sb_ss_i_0_1?__mk_zh_CN=%E4%BA%9A%E9%A9%AC%E9%80%8A%E7%BD%91%E7%AB%99&url=search-alias%3Daps&field-keywords=%E4%B8%93%E5%85%AB&x=0&y=0&sprefix=%E4%B8%93
please do me a favor if you got some spare time.Parse the HTML content like I do see if you can solve this one
The weird thing is if you use
open-urion that page you get a different result than when using something likecurlorwget.However when you change the
User-Agentyou actually get probably the page you are looking for:Analysis:
Output:
Solution:
User-Agent: Wget/1.10.2xpath('//div[@class="productImage"]//img')