I’m trying to scrape the following website, since the XML is malformed and does not contain all of the data I need:
http://www.cafebonappetit.com/menu/your-cafe/pitzer
When I fetch the document with Mechanize, however, I only get:
{meta_refresh}
{title "Collins | Claremont McKenna Cafés | Café Bon Appétit"}
{iframes}
{frames}
{links
#<Mechanize::Page::Link "Welcome" "http://www.cafebonappetit.com/">
#<Mechanize::Page::Link "Our Approach" "javascript://">
#<Mechanize::Page::Link
"Kitchen Principles"
"http://www.cafebonappetit.com/our-approach/kitchen-principles">
.....
}
Unfortunately, I obviously need to get at what is in the tables (I guess they are iFrames). Any thoughts?
Thanks!
Here’s a simple mech + Nokogiri script that scrapes the menu items.
Result (excerpt):