Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6702043
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T07:00:23+00:00 2026-05-26T07:00:23+00:00

These are the modules I have installed. use WWW::Mechanize; use XML::Simple; use LWP::Simple; use

  • 0

These are the modules I have installed.

use WWW::Mechanize;
use XML::Simple;
use LWP::Simple;
use Data::Dumper;
use Web::Scraper;
#use HTML::Grabber;

I am trying to get all links that end in ‘.com‘ up to an html tag: ‘<div class="nogo_class">Proceed No More</div>‘ creating an array from the results.

I have looked at various examples I found here and in documentation but, nothing that does this.
Nothing I can wrap my noob mind around anyhow.

So, using the modules I have installed, how can I get all links that end in ‘.com‘ up to that stopping point: ‘<div class="nogo_class">Proceed No More</div>‘ into an array?

So, later down I can get the links out with a loop or whatever.
eg $somearray[$counter];

I am really inexperienced and hope I asked the question properly. Verbose explanations in any examples will help me learn this.

Thanks for you help.

P.S. the ‘nogo_class’ is used multiple times in the page but, the ‘Proceed No More’ text only appears once in the page. ALSO, I am running Perl v5.8.8 and Grabber needs v5.10.0 minimum.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T07:00:23+00:00Added an answer on May 26, 2026 at 7:00 am

    Use HTML::TokeParser::Simple to parse the document. Stop parsing a tags when you find the text “Proceed No More” in a div.nogo.

    #!/usr/bin/env perl
    
    use warnings; use strict;
    use HTML::TokeParser::Simple;
    use URI;
    
    my $p = HTML::TokeParser::Simple->new(handle => \*DATA);
    
    my @interesting_links;
    
    while (my $tag = $p->get_tag(qw'a div')) {
        if ($tag->is_start_tag('div')) {
            my $class = $tag->get_attr('class');
            if (defined($class) and $class eq 'nogo_class') {
                my $text = $p->get_text('/div');
                last if defined($text) and $text eq 'Proceed No More';
            }
        }
        elsif ($tag->is_start_tag('a')) {
            my $href = $tag->get_attr('href');
            next unless defined $href;
            my $uri = URI->new($href);
            my $host = $uri->host;
            next unless $host =~ /[.]com\z/;
            push @interesting_links, $href;
        }
    }
    
    print "$_\n" for @interesting_links;
    
    __DATA__
    <!DOCTYPE HTML>
    <html>
    <head>
    <title>Test</title>
    </head>
    <body>
    
    <p><a href="http://example.com/link1">Link 1</a>, <a
    href="http://example.org/link2">Link 2</a> and <a
    href="http://example.com/link3">Link 3</a></p>
    
    <div class="nogo_class">Keep going man!</div>
    
    <p><a href="http://example.com/link4">Link 4</a>, <a
    href="http://example.org/link5">Link 5</a> and <a
    href="http://example.net/link6">Link
    6</a></p>
    
    <div class="nogo_class">Keep going man!</div>
    
    <div class="nogo_class">Proceed No More</div>
    
    <p><a href="#">Link 7</a>, <a href="#">Link 8</a> and <a href="#">Link
    9</a></p>
    
    
    </body>
    </html>
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have these two modules : package G1; sub new { my $class =
I have ipython (0.12.dev) installed and I have noticed that I cannot use it
I'm working with Drupal-7 ; where i have installed so many different modules in
I have Strawberry Perl 5.10 and mod_perl2 installed per these instructions on the mod_perl
Haxe has Apache httpd modules and can compile to PHP code. These are 2
This should be simple. Yet, it's giving me Hell. Problem I have compiled the
I have just downloaded and installed the 'facebook status' module and 'user feedback' module
I have a very simple problem. I have an application which is written in
I have Windows Vista 64-bit SP2. I am trying to use wxPython for GUI
I installed Yahoo BOSS (it's a Python installation that allows you to use their

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.