I’m getting started with Nokogiri and trying to extract some data from IMDB. However, IMDB’s html is a bit funny and can’t seem to find the css selectors a few things such as the creator’s name for example: http://www.imdb.com/title/tt1439629/. How could I get what I’m after ?
require 'nokogiri'
require 'open-uri'
url = "http://www.imdb.com/title/tt1439629/"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("h1").text.strip
This is a beginner tutorial for Nokogiri which explains how to VERY EASILY find what you’re looking for helped by firebug and its copy CSS function.
http://ruby.bastardsbook.com/chapters/html-parsing/