I just started using java so sorry if this question’s answer is obvious. I can’t really figure out how to share variables in java. I have been playing around with python and wanted to try to port some code over to Java to learn the langauge a bit better. Alot of my code is ported but I’m unsure how exactly multiprocessing and sharing of variables works in Java(my process is not disk bound, and uses alot of cpu and searching of a list).
In Python, I can do this:
from multiprocessing import Pool, Manager
manager = Manager()
shared_list = manager.list()
pool = Pool(process=4)
for variables_to_send in list_of_data_to_process:
pool.apply_async(function_or_class, (variables_to_send, shared_list))
pool.close()
pool.join()
I’ve been having a bit of trouble figuring out how to do multiprocessing and sharing like this in Java. This question helped me understand a bit(via the code) how implementing runnable can help and I’m starting to think java might automatically multiprocess threads(correct me if I’m wrong on this I read that once threads exceed capacity of a cpu they are moved to another cpu? The oracle docs seem to be more focused on threads than multiprocessing). But it doesn’t explain how to share lists or other variables between proceses(and keep them in close enough sync).
Any suggestions or resources? I am hoping I’m searching for the wrong thing(multiprocessing java) and that this is hopefully as easy(or similarly straightforward) as it is in my above code.
Thanks!
There is an important difference between a thread and a process, and you are running into it now: with some exceptions, threads share memory, but processes do not.
Note that real operating systems have ways around just about everything I’m about to say, but these features aren’t used in the typical case. So, to fire up a new process, you must clone the current process in some way with a system call (on *nix, this is
fork()), and then replace the code, stack, command-line arguments, etc. of the child process with another system call (on *nix, this is theexec()family of system calls). Windows has rough equivalents of both these system calls, so everything I’m saying is cross-platform. Also, the Java Runtime Environment takes care of all these system calls under the covers, and without JNI or some other interop technology you can’t really execute them yourself.There are two important things to note about this model: the child process doesn’t share the address space of the parent process, and the entire address space of the child process gets replaced on the
exec()call. So, variables in the parent process are unavailable to the child process, and vice versa.The thread model is quite different. Threads are kind of like lite processes, in that each thread has its own instruction pointer, and (on most systems) threads are scheduled by the operating system scheduler. However, a thread is a part of a process. Each process has at least one thread, and all the threads in the process share memory.
Now to your problem:
The Python multiprocessing module spawns processes with very little effort, as your code example shows. In Java, spawning a new process takes a little more work. It involves creating a new Process object using ProcessBuilder.start() or Runtime.exec(). Then, you can pipe strings to the child process, get back its output, wait for it to exit, and a few other communication primitives. I would recommend writing one program to act as the coordinator and fire up each of the child processes, and writing a worker program that roughly corresponds to
function_or_classin your example. The coordinator can open multiple copies of the worker program, give each a task, and wait for all the workers to finish.