Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8669519
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T18:29:36+00:00 2026-06-12T18:29:36+00:00

I have a cluster with 4 nodes and a master server. The master dispatches

  • 0

I have a cluster with 4 nodes and a master server. The master dispatches jobs that may take from 30 seconds to 15 minutes to end.

The nodes are listening with a SocketServer.TCPServer and in the master, I open a connection and wait for the job to end.

def run(nodes, args):
    pool = multiprocessing.Pool(len(nodes))
    return pool.map(load_job, zip(nodes, args))

the load_job function sends the data with socket.sendall and right after that, it uses socket.recv (The data takes a long time to arrive).

The program runs fine until about 200 or 300 of theses jobs run. When it breaks, the socket.recv receives an empty string and cannot run any more jobs until I kill the node processes and run them again.

How should I wait for the data to come? Also, error handling in pool is very poor because it saves the error from another process and show without the proper traceback and this error is not so common to repeat…


EDIT:
Now I think this problem has nothing to do with sockets:

After some research, looks like my nodes are opening way to many processes (because they also run their jobs in a multiprocessing.Pool) and somehow they are not being closed!

I found these SO question (here and here) talking about zombie processes when using multiprocessing in a daemonized process (exactly my case!).

I’ll need to further understand the problem, but for now I’m killing the nodes and restoring them after some time.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T18:29:37+00:00Added an answer on June 12, 2026 at 6:29 pm

    (I’m replying to the question before the edit, because I don’t understand exactly what you meant in it).

    socket.recv is not the best way to wait for data on a socket. The best way I know is to use the select module (documentation here). The simplest use when waiting for data on a single socket would be select.select([your_socket],[],[]), but it can certainly be used for more complex tasks as well.

    Regarding the issue of socket.recv receives an empty string; When the socket is a TCP socket (as it is in your case), this means the socket has been closed by the peer.
    Reasons for this may vary, but the important thing to understand is that after this happens, you will no longer receive any data from this socket, so the best thing you can do with it is close it (socket.close). If you don’t expect it to close, this is where you should search for the problem.

    Good luck!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have 6 nodes in my cassandra cluster. Now I have upgraded cluster from
Assuming I have a cluster of n Erlang nodes, some of which may be
We have two nodes in a cluster. Both run an ASP.NET web application that
We have Cassandra-0.8.2 cluster of 24 nodes and replication factor 2 . One of
I have a small cluster of peers (e.g. 10 nodes) already publishing and subscribing
We have a cluster of window services that acts as a hub. These services
I have a cluster in weblogic 9.2 with 2 nodes(172.20.1.68:7101, 172.20.1.23:7102), 1 adminserver (172.20.1.23:7001)
I want to have an EC2 based cluster that can grow and shrink at
I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when
I have a cluster with two nodes, and I am trying to connect to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.