I’m having trouble figuring out why I can’t get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.
This is the code I have thus far:
…..
doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a/@href').each do |node|
#doc.xpath("//meta[@name='Keywords']").each do |node|
puts node.text
….
This successfully renders all of the a href text in the page, but when I try to use it for keywords it doesn’t show anything. I’ve tried several variations of this with no luck. I assume that the the “.text” callout after node is wrong, but I’m not sure.
My apologies for how rough this code is, I’m doing my best to learn here.
You’re correct, the problem is
text.textreturns the text between the opening tag and the closing tag. Since meta-tags are empty, this gives you the empty string. You want the value of the “content” attribute instead.Since you know that there will be only one meta-tag with the name “keywords”, you don’t actually need to loop through the results, but can take the first item directly like this:
Note however, that this will cause an error if there is no meta-tag with the name “content”, so the first option might be preferable.