We had a web services server running in python 3.2 (Fedora Core 14 64b)

Question

0

Asked: June 11, 20262026-06-11T14:22:01+00:00 2026-06-11T14:22:01+00:00

We had a web services server running in python 3.2 (Fedora Core 14 64b)

0

We had a web services server running in python 3.2 (Fedora Core 14 64b) but were forced to back-port to python 2.6.7 because of a new dependency (which did not have 3.2 support). There is a section of the code that was using concurrent futures that has been rewritten to use the multiprocessing.Pool to perform a couple critical sections in parallel. The code now looks like this:

import multiprocessing
def _run_threads(callable_obj, args, threads):
    pool = multiprocessing.Pool(processes=threads)
    process_list = [pool.apply_async(callable_obj, a) for a in args]
    pool.close()
    pool.join()
    return [x.get() for x in process_list]

Apologies for the confusing abuse of the name “threads.” These are processes.

Since implementing this function, we find that it sometimes hangs. We get a garbled traceback when we eventually kill the parent (master) process; but there are couple lines that seem critical:

[snip]
Process PoolWorker-445:
[snip]
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 59, in worker
task = get()
File "/usr/lib64/python2.6/multiprocessing/queues.py", line 352, in get
return recv()
racquire()
[snip]

It seems to me from the available evidence, that a child process in the pool is failing to receive the “close” signal from the parent process, so it sits waiting for work. The parent sits waiting for the child to shut down. The server hangs. This happens nondeterministically but too frequent for such a critical server (once a day).

Is there a problem in the coding of the run_threads() function? Is this a known problem with a known work-around? Obviously, we are using this for time-critical processing, so we prefer not to recode for sequential execution unless absolutely necessary. And one of the reasons for sticking to multiprocessing.Pool is the easy access to return codes for the operations run in parallel.

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T14:22:02+00:00

I am not sure where this issue has its origin. It’s definitely very interesting. However, maybe a little restructuring solves the problem. I think you do not require to terminate the pool processes before having collected your results, right? Maybe sticking to the ‘canonical’ way of using a Pool, as documented, helps:

result = pool.apply_async(time.sleep, (10,))
print result.get(timeout=1)           # raises TimeoutError

Or, in your case, call x.get() for x in process_list before closing/joining the pool. If the problem persists and occurs during get(), we at least know it has nothing to do with close().

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We had a web services server running in python 3.2 (Fedora Core 14 64b)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply