I am trying to make a web crawler that will login to a school website using my credentials and then crawl certain parts of the site. I am using the Beautiful Soup Python library found here:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
I can get the correct source code for the username and password fields, but I do not know how to supply them. Also, I have the same problem with submitting them. I have the source code for the “Submit” button scraped but I do not know how to request to login.
Thanks,
You can either use Mechanize, a library that emulates a browser, or just send the POST/GET request manually.
Mechanize’s homepage has a full example that you can try out.
If you want to go with the manual request, I usually just open Chrome’s JS console, serialize the form and see which parameters get sent:
Then, you just send a
POSTrequest to that URL with those parameters: