I have a python client which pushes a great deal of data through the standard library’s httlib. Users are complainging that the application is slow. I suspect that this may be partly due to the HTTP client I am using.
Could I improve performance by replacing httplib with something else?
I’ve seen that twisted offers a HTTP client. It seems to be very basic compared to their other protocol offerings.
PyCurl might be a valid alternative, however it’s use seems to be very un-pythonic, on the other hand if it’s performance is really good then I can put up with a bit of un-pythonic code.
So if you have experience of better HTTP client libraries of python please tell me about it. I’d like to know what you thought of the performance relative to httplib and what you thought of the quality of implementation.
UPDATE 0: My use of httplib is actually very limited – the replacement needs to do the following:
conn = httplib.HTTPConnection(host, port) conn.request('POST', url, params, headers) compressedstream = StringIO.StringIO(conn.getresponse().read())
That’s all: No proxies, redirection or any fancy stuff. It’s plain-old HTTP. I just need to be able to do it as fast as possible.
UPDATE 1: I’m stuck with Python2.4 and I’m working on Windows 32. Please do not tell me about better ways to use httplib – I want to know about some of the alternatives to httplib.
Often when I’ve had performance problems with httplib, the problem hasn’t been with the httplib itself, but with how I’m using it. Here are a few common pitfalls:
(1) Don’t make a new TCP connection for every web request. If you are making lots of request to the same server, instead of this pattern:
conn = httplib.HTTPConnection('www.somewhere.com') conn.request('GET', '/foo') conn = httplib.HTTPConnection('www.somewhere.com') conn.request('GET', '/bar') conn = httplib.HTTPConnection('www.somewhere.com') conn.request('GET', '/baz')Do this instead:
conn = httplib.HTTPConnection('www.somewhere.com') conn.request('GET', '/foo') conn.request('GET', '/bar') conn.request('GET', '/baz')(2) Don’t serialize your requests. You can use threads or asynccore or whatever you like, but if you are making multiple requests from different servers, you can improve performance by running them in parallel.