I’m a newbie to programmer so excuse my noviceness. So I’m using Nokogiri to scrape a police crime log. Here is the code below:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.sfsu.edu/~upd/crimelog/index.html"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".brief").each do |brief|
puts brief.at_css("h3").text
end
I used the selector gadget bookmarklet to find the CSS selector for the log (.brief). When I pass “h3” through brief.at_css I get all of the h3 tags with the content inside.
However, if I add the .text method to remove the tags, I get NoMethod error.
Is there any reason why this is happening? What am I missing? Thanks!
To clarify if you look at the structure of the HTML source you will see that the very first occurrence of
<div class="brief">does not have a childh3tag (it actually only has a child<p>tag).The Nokogiri Docs say that
If you call
at_css(*rules)the docs states it is equivalent tocss(rules).first. When there are items (your.briefclass contains ah3) then anNokogiri::XML::Elementobject is returned which responds totext, whereas if your.briefdoes not contain ah3then aNilClassobject is returned, which of course does not respond totextSo if we call
css(rules)(notat_cssas you have) we get aNokogiri::XML::NodeSetobject returned, which has thetext()method defined as (notice thealias)because the class is
Enumerableit iterates over it’s children calling theirinner_textmethod and joins them all together.Therefore you can either perform a
nil?check or as @floatless correctly stated just use thecssmethod