I’m trying to access an authenticated site using a cookies.txt file (generated with a Chrome extension) with Python Requests:
import requests, cookielib
cj = cookielib.MozillaCookieJar('cookies.txt')
cj.load()
r = requests.get(url, cookies=cj)
It doesn’t throw any error or exception, but yields the login screen, incorrectly. However, I know that my cookie file is valid, because I can successfully retrieve my content using it with wget. Any idea what I’m doing wrong?
Edit:
I’m tracing cookielib.MozillaCookieJar._really_load and can verify that the cookies are correctly parsed (i.e. they have the correct values for the domain, path, secure, etc. tokens). But as the transaction is still resulting in the login form, it seems that wget must be doing something additional (as the exact same cookies.txt file works for it).
MozillaCookieJarinherits fromFileCookieJarwhich has the following docstring in its constructor:You need to call
.load()method then.Also, like Jermaine Xu noted the first line of the file needs to contain either
# Netscape HTTP Cookie Fileor# HTTP Cookie Filestring. Files generated by the plugin you use do not contain such a string so you have to insert it yourself. I raised appropriate bug at http://code.google.com/p/cookie-txt-export/issues/detail?id=5EDIT
Session cookies are saved with 0 in the 5th column. If you don’t pass
ignore_expires=Truetoload()method all such cookies are discarded when loading from a file.File
session_cookie.txt:Python script:
Output:
0EDIT 2
Although we managed to get cookies into the jar above they are subsequently discarded by
cookielibbecause they still have0value in theexpiresattribute. To prevent this we have to set the expire time to some future time like so:EDIT 3
I checked both wget and curl and both use
0expiry time to denote session cookies which means it’s the de facto standard. However Python’s implementation uses empty string for the same purpose hence the problem raised in the question. I think Python’s behavior in this regard should be in line with what wget and curl do and that’s why I raised the bug at http://bugs.python.org/issue17164I’ll note that replacing
0s with empty strings in the 5th column of the input file and passingignore_discard=Truetoload()is the alternate way of solving the problem (no need to change expiry time in this case).