I am using Python 2.6.5 and I am trying to capture the raw http request sent via HTTP, this works fine except when I add a proxy handler into the mix so the situation is as follows:
- HTTP and HTTPS requests work fine without the proxy handler: raw HTTP request captured
- HTTP requests work fine with proxy handler: proxy ok, raw HTTP request captured
- HTTPS requests fail with proxy handler: proxy ok but the raw HTTP request is not captured!
The following questions are close but do not solve my problem:
- How do you get default headers in a urllib2 Request? <- My solution is heavily based on this
- Python urllib2 > HTTP Proxy > HTTPS request
- This sets the proxy for each request <- Did not work and doing it once at the start via an opener is more elegant and efficient (instead of setting the proxy for each request)
This is what I am doing:
class MyHTTPConnection(httplib.HTTPConnection):
def send(self, s):
global RawRequest
RawRequest = s # Saving to global variable for Requester class to see
httplib.HTTPConnection.send(self, s)
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
return self.do_open(MyHTTPConnection, req)
class MyHTTPSConnection(httplib.HTTPSConnection):
def send(self, s):
global RawRequest
RawRequest = s # Saving to global variable for Requester class to see
httplib.HTTPSConnection.send(self, s)
class MyHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self, req):
return self.do_open(MyHTTPSConnection, req)
Requester class:
global RawRequest
ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'http://127.0.0.1:8080' }
# If ProxyConf = { 'http':'http://127.0.0.1:8080' }, then Raw HTTPS request captured BUT the proxy does not see the HTTPS request!
# Also tried with similar results: ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'https://127.0.0.1:8080' }
ProxyHandler = urllib2.ProxyHandler(ProxyConf)
urllib2.install_opener(urllib2.build_opener(ProxyHandler, MyHTTPHandler, MyHTTPSHandler))
urllib2.Request('http://www.google.com', None) # global RawRequest updated
# This is the problem: global RawRequest NOT updated!?
urllib2.Request('https://accounts.google.com', None)
BUT, if I remove the ProxyHandler it works!:
global RawRequest
urllib2.install_opener(urllib2.build_opener(MyHTTPHandler, MyHTTPSHandler))
urllib2.Request('http://www.google.com', None) # global RawRequest updated
urllib2.Request('https://accounts.google.com', None) # global RawRequest updated
How can I add the ProxyHandler into the mix while keeping access to the RawRequest?
Thank you in advance.
Answering my own question: It seems a bug in the underlying libraries, making RawRequest a list solves the problem: The HTTP Raw request is the first item. The custom HTTPS class is called several times, the last of which is blank. The fact that the custom HTTP class is only called once suggests this is a bug in python but the list solution gets around it
just needs to be changed to:
with a previous initialisation of
RawRequest = []and retrieval of raw request viaRawRequest[0](first element of the list)