Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6688029
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T05:21:26+00:00 2026-05-26T05:21:26+00:00

I am trying to save a couple of web pages by using a web

  • 0

I am trying to save a couple of web pages by using a web crawler. Usually I prefer doing it with perl’s WWW::Mechanize modul. However, as far as I can tell, the site I am trying to crawl has many javascripts on it which seem to be hard to avoid. Therefore I looked into the following perl modules

  • WWW::Mechanize::Firefox
  • MozRepl
  • MozRepl::RemoteObject

The Firefox MozRepl extension itself works perfectly. I can use the terminal for navigating the web site just the way it is shown in the developer’s tutorial – in theory. However, I have no idea about javascript and therefore am having a hard time using the moduls properly.

So here is the source i like to start from: Morgan Stanley

For a couple of listed firms beneath ‘Companies – as of 10/14/2011’ I like to save their respective pages. E.g. clicking on the first listed company (i.e. ‘1-800-Flowers.com, Inc’) a javascript function gets called with two arguments -> dtxt('FLWS.O','2011-10-14'), which produces the desired new page. The page I now like to save locally.

With perl’s MozRepl module I thought about something like this:

use strict;
use warnings;
use MozRepl;

my $repl = MozRepl->new;
$repl->setup; 
$repl->execute('window.open("http://www.morganstanley.com/eqr/disclosures/webapp/coverage")');

$repl->repl_enter({ source => "content" });
$repl->execute('dtxt("FLWS.O", "2011-10-14")');

Now I like to save the produced HTML page.

So again, the desired code I like to produce should visit for a couple of firms their HTML site and simply save the web page. (Here are e.g. three firms: MMM.N, FLWS.O, SSRX.O)

  1. Is it correct, that I cannot go around the page’s javascript functions and therefore cannot use WWW::Mechanize?
  2. Following question 1, are the mentioned perl modules a plausible approach to take?
  3. And finally, if you say the first two questions can be anwsered with yes, it would be really nice if you can help me out with the actual coding. E.g. in the above code, the essential part which is missing is a 'save'-command. (Maybe using Firefox’s saveDocument function?)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T05:21:26+00:00Added an answer on May 26, 2026 at 5:21 am

    The web works via HTTP requests and responses.

    If you can discover the proper request to send, then you will get the proper response.

    If the target site uses JS to form the request, then you can either execute the JS,
    or analyse what it does so that you can do the same in the language that you are using.

    An even easier approach is to use a tool that will capture the resulting request for you, whether the request is created by JS or not, then you can craft your scraping code
    to create the request that you want.

    The “Web Scraping Proxy” from AT&T is such a tool.

    You set it up, then navigate the website as normal to get to the page you want to scrape,
    and the WSP will log all requests and responses for you.

    It logs them in the form of Perl code, which you can then modify to suit your needs.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a couple of domain objects which I am trying to save using
Im trying to save a bitmap jpg format with a specified encoding quality. However
When trying to save the database using boost serialization, I encounter the segfault that
I am trying to save a file using GetSaveFileName and want to have a
I am trying to save a couple of images that are linked to by
I'm new. Been trying to save/load a couple of arrays but to no avail.
I am trying to save a couple of preferences in a program I can't
im trying to save a row in my settings table, it works all fine
Im trying to save a simple script in netbeans to htdocs, but i only
I am trying to save data to a database on a button push, but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.