I was trying to create a simple parent child process with IPC between them using Hadoop IPC. It turns out that program executes and prints the results but it doesn’t exit. Here is the code for it.
interface Protocol extends VersionedProtocol{
public static final long versionID = 1L;
IntWritable getInput();
}
public final class JavaProcess implements Protocol{
Server server;
public JavaProcess() {
String rpcAddr = "localhost";
int rpcPort = 8989;
Configuration conf = new Configuration();
try {
server = RPC.getServer(this, rpcAddr, rpcPort, conf);
server.start();
} catch (IOException e) {
e.printStackTrace();
}
}
public int exec(Class klass) throws IOException,InterruptedException {
String javaHome = System.getProperty("java.home");
String javaBin = javaHome +
File.separator + "bin" +
File.separator + "java";
String classpath = System.getProperty("java.class.path");
String className = klass.getCanonicalName();
ProcessBuilder builder = new ProcessBuilder(
javaBin, "-cp", classpath, className);
Process process = builder.start();
int exit_code = process.waitFor();
server.stop();
System.out.println("completed process");
return exit_code;
}
public static void main(String...args) throws IOException, InterruptedException{
int status = new JavaProcess().exec(JavaProcessChild.class);
System.out.println(status);
}
@Override
public IntWritable getInput() {
return new IntWritable(10);
}
@Override
public long getProtocolVersion(String paramString, long paramLong)
throws IOException {
return Protocol.versionID;
}
}
Here is the child process class. However I have realized that it is due to RPC.getServer() on the server side that it the culprit. Is it some known hadoop bug, or I am missing something?
public class JavaProcessChild{
public static void main(String...args){
Protocol umbilical = null;
try {
Configuration defaultConf = new Configuration();
InetSocketAddress addr = new InetSocketAddress("localhost", 8989);
umbilical = (Protocol) RPC.waitForProxy(Protocol.class, Protocol.versionID,
addr, defaultConf);
IntWritable input = umbilical.getInput();
JavaProcessChild my = new JavaProcessChild();
if(input!=null && input.equals(new IntWritable(10))){
Thread.sleep(10000);
}
else{
Thread.sleep(1000);
}
} catch (Throwable e) {
e.printStackTrace();
} finally{
if(umbilical != null){
RPC.stopProxy(umbilical);
}
}
}
}
We sorted that out via mail. But I just want to give my two cents here for the public:
So the thread that is not dying there (thus not letting the main thread finish) is the
org.apache.hadoop.ipc.Server$Reader.The reason is, that the implementation of
readSelector.select();is not interruptable. If you look closely in a debugger or threaddump, it is waiting on that call forever, even if the main thread is already cleaned up.Two possible fixes:
won’t be cleaned up properly, but the process will end)
However, this is a bug in Hadoop and I have no time to look through the JIRAs. Maybe this is already fixed, in YARN the old IPC is replaced by protobuf and thrift anyways.
BTW also this is platform dependend on the implementation of the selectors, I observed these zombies on debian/windows systems, but not on redhat/solaris.
If anyone is interested in a patch for Hadoop 1.0, email me. I will sort out the JIRA bug in the near future and edit this here with more information. (Maybe this is fixed in the meanwhile anyways).