I am making a small scraper for the yahoo.finance.com website. When I make this request:
symbol = 'AAPL'
@page = Nokogiri::HTML(open("http://finance.yahoo.com/q?s=#{symbol.upcase}&ql=1"))
def marketCap(symbol)
@page.xpath("//*[(@id = \"yfs_j10_#{symbol.downcase}\")]").text
end
puts marketCap(symbol)
It prints two times the same result.
“495.74B495.74B”
I looked at the source code and the tag is only showing it once
<span id="yfs_j10_f">51.74B</span>
If I use a css selector instead I get the same problem.
Is it a bug or did I made a mistake?
Thanks.
isn’t correct.
xpathreturns a NodeSet, which is similar to an Array. If it contains two elementstextwill contain both of them:Instead, use
at_xpathto find the first one.Now, instead of using XPath, which I feel is usually more complicated and less readable, I’d recommend using CSS for your accessor:
Notice that I used
atinstead ofat_cssorat_xpath.atsenses whether you’re passing an XPath or CSS. It’s generic, and could make a mistake figuring out which to use, but it’s also easier to use. The same is true ofsearchinstead ofcssorxpath. It returns a NodeSet like the other two, but senses which type of accessor it should use.