I don’t need to crawl the whole internet, I just need to open a

Question

0

Asked: May 17, 20262026-05-17T19:42:55+00:00 2026-05-17T19:42:55+00:00

I don’t need to crawl the whole internet, I just need to open a

0

I don’t need to crawl the whole internet, I just need to open a few URL, extract other URL, and then save some page in a way that they can be browsed on the disk later. What library would be appropriate to program that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T19:42:55+00:00

Mechanize is very good for those sort of things.

http://mechanize.rubyforge.org/mechanize/

In particular this page will help:

http://mechanize.rubyforge.org/mechanize/GUIDE_rdoc.html

Under the covers Mechanize uses Nokogiri to parse the document. Here’s a simple version using Open-URI and Nokogiri to read a page, extract all links and write the HTML.

Added example:

require 'open-uri'
require 'nokogiri'

doc = Nokogiri::HTML(open('http://some.web.site'))

Accessing the links is easy. This uses CSS accessors:

hrefs = (doc/'a[href]').map{ |a| a['href'] }

This uses XPath to do the same thing:

hrefs = (doc/'//a[@href]').map{ |a| a['href'] }

Saving the content is easy. Create a file, and ask Nokogiri to spit it out as HTML:

File.new('some_web_site.html', 'w') { |fo| fo.puts doc.to_html }

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I don’t need to crawl the whole internet, I just need to open a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply