Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8075747
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T15:05:32+00:00 2026-06-05T15:05:32+00:00

I need to pull all the links for a page that resides on an

  • 0

I need to pull all the links for a page that resides on an Intranet however am unsure how best to do it. The structure of the site is as follows

List of topics

  1. Topic 1

  2. Topic 2

  3. Topic 3

etc

Now the links reside in each of the topic pages. I want to avoid going through in excess of 500 topic pages manually to extract the URI.

Each of the topic pages has the following structure

http://alias/filename.php?cat=6&number=1

The cat parameter refers to the category and the number parameter refers to the topic.

Once in the topic page the URI I need to extract exists in a particular format again

http://alias/value?id=somevalue

Caveats

  1. I don’t have access to the database so the option to trawl through it is not an option
  2. There is only ever a single URI in each topic page
  3. I need to extract the list to a file that simply lists each URI in a new line

I would like to execute some sort of script I can run from the terminal via BASH that will trawl through the topical URI and then the URI in each of the topics.

In a nutshell

How can I extract a list using a script I can run using BASH that will recursively go through all the list of topics and then extract the URI in each of the topic pages and spit out a text file with the each of extracted URI in a new line.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T15:05:33+00:00Added an answer on June 5, 2026 at 3:05 pm

    I implement this with Perl, using the HTML::TokeParser and WWW::Mechanize modules:

    use HTML::TokeParser;
    use WWW::Mechanize;
    
    my $site = WWW::Mechanize->new(autocheck =>1);
    my $topicmax = 500;  #Note:  adjust this to the number of topic pages you have
    
    # loop through each topic page
    foreach(1..$topicmax) {
        my $topicurl = "http://alias/filename.php?cat=6&number=$_";
    
        # get the page
        $site->get($topicurl);
        $p = HTML::TokeParser->new(\$site->{content});
    
        # parse the page and extract the links
        while (my $token = $p->get_tag("a")) {
            my $url = $token->[1]{href};
            # use a regex to test for the link format we want
            if($url =~ /^http:\/\/alias\/value\?id=/) {
                print "$url\n";
            }
        }
    }
    

    The script prints to stdout, so you just need to redirect it to a file.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm going to need to push and pull files from a SharePoint site that
I need to pull all non-private items from a user's calendar in Exchange 2003.
I need a little help here please.What am trying to do is pull all
I need to pull a list of items from a MySQL database and then
From a string, I need to pull out groups that match a given pattern.
I have a page with several drop down lists that all have the same
I needed to create a custom select list for the user registration page that
YES, I have seen the posts that all you need to do is link
I need to pull out the content out of two paragraph tags and break
I need to pull data from one SQL 2005 Express database to another and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.