I am working on a Python backend web server that grabs realtime data from a paid 3rd party API.
I need to query this API very fast (about 150 queries per 10 seconds). Therefore, I created a small proof of concept that spawns 200 threads and writes urls to a queue. The threads then read from the url from the queue and send the HTTP request. The 3rd party API returns a value called delay, which is how long it took their server to process the request.
Here is the POC code that just downloads all the urls (not repeatedly).
_http_pool = urllib3.PoolManager()
def getPooledResponse(url):
return _http_pool.request("GET", url, timeout=30)
class POC:
_worker_threads = []
WORKER_THREAD_COUNT = 200
q = Queue.Queue()
@staticmethod
def worker():
while True:
url = POC.q.get()
t0 = datetime.datetime.now()
r = getPooledResponse(item)
print "thread %s took %d seconds to process the url (service delay %d)" % (threading.currentThread().ident, (datetime.datetime.now() - t0).seconds, getDelayFromResponse(r))
POC.q.task_done()
@staticmethod
def run():
# start the threads if we have less than the desired amount
if len(POC._worker_threads) < POC.WORKER_THREAD_COUNT:
for i in range(POC.WORKER_THREAD_COUNT - len(POC._worker_threads)):
t = threading.Thread(target=POC.worker)
t.daemon = True
t.start()
POC._worker_threads.append(t)
# put the urls in the queue
for url in urls:
POC.q.put(url)
# sleep for just a bit so that the requests don't get sent out together (this is a limitation of the API I am using)
time.sleep(0.3)
POC.run()
When I run this, the first few results are returned with a reasonable delay:
thread 140544300453053 took 2 seconds to process the url (service delay 1.782)
However, after about 10-20 seconds I get these kinds of things:
thread 140548049958656 took 23 seconds to process the url (service delay 1.754)
In other words, even though the server returns with a small delay, my threads take longer to complete…
How do I test to see where the other 21 running seconds are spent?
Thanks!
You should use a profiler on the code.