So I have 2 main worker processes one is supposed to fetch content of a URL and the other
will insert each of the anchor links into a postgre table. Now I am running
12 instances of the first process all of which taking their URL from a single URL queue, and then placing the anchors in a second queue but how do I have an other set of threads trying to push the Anchors into the table? when I start the threads the find their queue empty and they die, if I disable that feature they wont die when the work is done, how do I manage this, and by the way is it better to use process instead of thread because of presumably intensive IO interaction involved?
So I have 2 main worker processes one is supposed to fetch content of
Share
you need two queues the
URLFetcherswill pop URLs from one queue and push into a second one, then theAnchorInsertersshould pop from this second queue to process the data. This organisation should give you a good sync mechanism for your problem.Edit: to avoid worker exiting
You need to block till one element is available.
From python’s
queue.getdoc …