Trying to scrape the content off a site with Python, that has a simple

Question

0

Asked: May 26, 20262026-05-26T03:27:53+00:00 2026-05-26T03:27:53+00:00

Trying to scrape the content off a site with Python, that has a simple

0

Trying to scrape the content off a site with Python, that has a simple form authentication with username and password, but also has a hidden field called “foil” that contains what looks like a randomly generated string each time the page is loaded. In order to successfully login that value must be included in the content header of the post. I’ve tried scraping out the random string after the login page loads but still redirects me back to login. I have a valid username and password for the site that works, but it is update sporadically and I would like to send myself an email when something changes. here is the code i’ve been working with so far…

import urllib, urllib2, cookielib,subprocess

url='https://example.com/login.asp'

username='blah'
password='blah'

request = urllib2.Request(url)
opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))
preData = opener.open(request).readlines()
for line in preData:
    if("foil" in line):
        foils = line.split('"')
        notFoiled = foils[3]

query_args={'location':'','qstring':'','absr_ID':notFoiled,'id':username,'pin':password,'submit':'Sign In'}
requestWheader = urllib2.Request('https://example.com/login.asp')
requestWheader.add_data(urllib.urlencode(query_args))
print 'Request method after data :', requestWheader.get_method()

print
print 'OUTGOING DATA:'
print requestWheader.get_data()

print
print 'SERVER RESPONSE:'
print urllib2.urlopen(requestWheader).read()
rawRes = urllib2.urlopen(requestWheader).read()

The form looks like this…

<form name="loginform" method="post" action="https://example.com/login.asp?x=x&amp;&amp;pswd=">
<input type=hidden name="location" value="">
<input type=hidden name="qstring" value="">
<input type=hidden name="absr_ID" value="">
<input type=hidden name="foil" value="91fcMO">
<input type="text" name="id" maxlength="80" size="21" value="" mask="" desc="ID" required="true">
<input type="submit" name="submit" value="Sign In" onClick="return checkForm(loginform)">
<input type="password" name="pin" size="6" maxlength="6" desc="Pin" required="true">

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T03:27:53+00:00

You import cookielib but it does not seem like you’re using any CookieJars:

jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))

Then use the same opener for both initial form fetching and login form submission. I assume it’s a cookie-based protection where a value that comes from the foil field has to match a cookie that comes in the headers.

Another thing I noticed in your code is that you assign notFoiled to absr_ID instead of foil. Was that intentional?

Also please do yourself a favor and use html5lib or BeautifulSoup instead of parsing HTML manually.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Trying to scrape the content off a site with Python, that has a simple

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply