I am trying to port some old web scraping scripts written using older Perl modules to work using only Mojolicious.
Have written a few basic scripts with Mojo but am puzzled on an authenticated login which uses a secure login site and how this should be handled with a Mojo::UserAgent script. Unfortunately the only example I can see in the documentation is for basic authentication without forms.
The Perl script I am trying to convert to work with Mojo:UserAgent is as follows:
#!/usr/bin/perl
use LWP;
use LWP::Simple;
use LWP::Debug qw(+);
use LWP::Protocol::https;
use WWW::Mechanize;
use HTTP::Cookies;
# login first before navigating to pages
# Create our automated browser and set up to handle cookies
my $agent = WWW::Mechanize->new();
$agent->cookie_jar(HTTP::Cookies->new());
$agent->agent_alias( 'Windows IE 6' ); #tell the website who we are (old!)
# get login page
$agent->get("https://reg.mysite.com")
$agent->success or die $agent->response->status_line;
# complete the user name and password form
$agent->form_number (1);
$agent->field (username => "user1");
$agent->field (password => "pass1");
$agent->click();
#try to get member's only content page from main site on basis we are now "logged in"
$agent->get("http://www.mysite.com/memberpagesonly1");
$agent->success or die $agent->response->status_line;
$member_page = $agent->content();
print "$member_page\n";
So the above works fine. How to convert to do the same job in Mojolicious?
Mojolicious is a web application framework. While
Mojo::UserAgentworks well as a low-level HTTP user agent, and provides facilities that are unavailble fromLWP(in particular native support for asynchronous requests and IPV6) neither are as convenient to use as asWWW::Mechanizefor web scraping.WWW::MechanizesubclassesLWP::UserAgentto interface with the internet, and usesHTML::Formto process the forms it finds.Mojo::UserAgenthas no facility for processing HTML forms, and so building the corresponding HTTP requests is not at all straighforward. Information such as the HTTP method used (GETorPOST) the names of the form fields, and the insertion of default values for hidden fields are all done automatically byHTML::Formand are left to the programmer if you restrict yourself toMojo::UserAgent.It seems to me that even trying to use
Mojo::UserAgentin combination withHTML::Formis poblematic, as the former requires aMojo::Transaction::HTTPobject to represent the submission of a filled-in form, whereas the latter generatesHTTP::Requestobjects for use withLWP.In short, unless you are willing to largely rewrite
WWW::Mechanize, I think there is no way to reimplement your software usingMojoliciousmodules.