I am working on a realtime data grabber. I have a while True loop, and inside it, I spawn threads that do relatively small tasks (I am querying a 3rd party API over HTTP, and to achieve fast speeds I am querying in parallel).
Every thread takes care of updating a specific data series. This might take 2, 3 or even 5 seconds. However, my while True loop might spawn threads faster than how long it takes for the thread to finish. Hence, I need the spawned threads to wait for their previous threads to finish.
In general, its unpredictable how long it takes for the threads to finish because the threads query an HTTP server…
I was thinking of creating a named semaphore for every thread, and then if a thread spawned for a specific series finds a previous thread working on the same series, it will wait.
The only issue that I can see is a possible backlog of threads..
What is the best solution here? Should I look into things like Celery? I am currently using the threading module.
Thanks!
Another option you could use if you are just requerying the API every time one of your queries returns is an asyncronous framework like Twisted (Their tutorial on Threading). I’m a relative Twisted beginner so there may be better ways of twisting Twisted to your task than this –