In this post I learned that Mechanize in Ruby/Perl is easier to use than

Question

0

Asked: May 27, 20262026-05-27T10:29:39+00:00 2026-05-27T10:29:39+00:00

In this post I learned that Mechanize in Ruby/Perl is easier to use than

0

In this post I learned that Mechanize in Ruby/Perl is easier to use than HTML::TreeBuilder 3 in that particular example.

Is Mechanize superior to HTML::TokeParser?

Would the below also have been easier to write in Ruby using Mechanize?

sub get_img_page_urls {
    my $url = shift;

    my $ua = LWP::UserAgent->new;
    $ua->agent("$0/0.1 " . $ua->agent);
    $ua->agent("Mozilla/8.0");

    my $req = new HTTP::Request 'GET' => "$url";
    $req->header('Accept' => 'text/html');

    $response_u = $ua->request($req);  # send request

    die "Error: ", $response_u->status_line unless $response_u->is_success;

    my $stream = HTML::TokeParser->new(\$response_u->content);

    my %urls = ();

    my $found_thumbnails = 0;
    my $found_thumb = 0;

    while (my $token = $stream->get_token) {

        # <div class="thumb-box" ... >
        if ($token->[0] eq 'S' and $token->[1] eq 'div' and $token->[2]{class} eq 'thumb-box') {
            $found_thumbnails = 1;
        }

        # <div class="thumb" ... >
        if ($token->[0] eq 'S' and $token->[1] eq 'div' and $token->[2]{class} eq 'thumb') {
            $found_thumb = 1;
        }

        #                                          <a ... >
        if ($found_thumbnails and $found_thumb and $token->[0] eq 'S' and $token->[1] eq 'a') {
            $urls{'http://example.com' . "$token->[2]{href}"} = 1;

            # one url have been found. Now start all over.
            $found_thumb = 0;
            $found_thumbnails = 0;
        }

    }

    return %urls;
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T10:29:40+00:00

Mechanize is more than a parser. It adds an emulated browser, which allows you to navigate a site, fill out forms, etc. But it also includes a parser, making web scraping very simple. Here’s your method rewritten using ruby Mechanize:

def get_img_page_urls(url)
  agent = Mechanize.new
  agent.user_agent_alias = "Windows Mozilla"
  agent.get(url).search("//div[@class='thumb-box']/div[@class='thumb']/a/@href")
end

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In this post I learned that Mechanize in Ruby/Perl is easier to use than

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply