I was thinking that, if I access a password protected site using python’s mechanism, I would get a 401 Unauthorized error which needs authentication data.
So inside my script, I tried to access my yahoo mail box which apparently needs username and password, I thought I would get 401, but I didn’t.
Code:
yahoo_mail = 'http://mail.cn.yahoo.com'
br = mechanize.Browser()
r = br.open(yahoo_mail)
print r.info() #here, I got 200, it's ok apparently
br.select_form(nr=0) #select the login form
r = br.submit() #submit the form without providing username and password
print r.info() #but I didn't get 401, why?
Question:
- Why I didn’t get 401 without providing auth-info ?
- If not my mail box, any other website can give me a 401 ?
Most web sites these days do not use HTTP Authentication. So 401 is not returned if you fail to log in; instead, a normal 200 successful response is returned, and the text inside the web page says you did not log in.
Instead, sites use cookies. This means that your browser does not actually know what sites it is logged into; when you finally provide a successful password to Yahoo!, it either changes the cookie it has stored on your browser, or maybe even keeps the cookie the same but just changes the database record on their end that is associated with the cookie.
So HTTP status codes are generally useless during the process of logging in. Instead you will have to scrape the text of the “200 Success” page that comes back to see if it congratulates you on logging in or repeats the form; or, alternately, you might just check the URL of the page you get back, and see whether it is the login form again, or whether it is instead the destination that you wanted to visit.