In Perl there is an LWP module:
The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW clients. The library also contain modules that are of more general use and even classes that help you implement simple HTTP servers.
Is there a similar module (gem) for Ruby?
Update
Here is an example of a function I have made that extracts URL’s from a specific website.
use LWP::UserAgent;
use HTML::TreeBuilder 3;
use HTML::TokeParser;
sub get_gallery_urls {
my $url = shift;
my $ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);
$ua->agent("Mozilla/8.0");
my $req = new HTTP::Request 'GET' => "$url";
$req->header('Accept' => 'text/html');
# send request
$response_u = $ua->request($req);
die "Error: ", $response_u->status_line unless $response_u->is_success;
my $root = HTML::TreeBuilder->new;
$root->parse($response_u->content);
my @gu = $root->find_by_attribute("id", "thumbnails");
my %urls = ();
foreach my $g (@gu) {
my @as = $g->find_by_tag_name('a');
foreach $a (@as) {
my $u = $a->attr("href");
if ($u =~ /^\//) {
$urls{"http://example.com"."$u"} = 1;
}
}
}
return %urls;
}
The closest match is probably httpclient, which aims to be the equivalent of LWP. However, depending on what you plan to do, there may be better options. If you plan to follow links, fill out forms, etc. in order to scrape web content, you can use Mechanize which is similar to the perl module by the same name. There are also more Ruby-specific gems, such as the excellent Rest-client and HTTParty (my personal favorite). See the HTTP Clients category of Ruby Toolbox for a larger list.
Update: Here’s an example of how to find all links on a page in Mechanize (Ruby, but it would be similar in Perl):
P.S. As an ex-Perler myself, I used to worry about abandoning the excellent CPAN–would I paint myself into a corner with Ruby? Would I not be able to find an equivalent to a module I rely on? This has turned out not to be a problem at all, and in fact lately has been quite the opposite: Ruby (along with Python) tends to be the first to get client support for new platforms/web services, etc.