I’m trying to parse the Twitter usernames from a bit.ly stats page using Nokogiri:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://bitly.com/U026ue+/global'))
twitter_accounts = []
shares = doc.xpath('//*[@id="tweets"]/li')
shares.map do |tweet|
twitter_accounts << tweet.at_css('.conv.tweet.a')
end
puts twitter_accounts
My understanding is that Nokogiri will save shares in some form of tree structure, which I can use to drill down into, but my mileage is varying.
Actually, Eric Walker was onto something. If you look at
doc, the section where the tweets are supposed to be look like:This is likely because they’re generated by some JavaScript call which Nokogiri isn’t executing. One possible solution is to use
watirto traverse to the page, load the JavaScript and then save the HTML.Here is a script that accomplishes just that. Note that you had some issues with your XPath arguments which I’ve since solved, and that watir will open a new browser every time you run this script:
You can also use headless to prevent a window from opening.