I can’t figure out the problem in the following short script which should compare a single CPU computation with a parallelization concerning computation time.

Link to full image: LINK
The code is:
n = 700;
ranksSingle = zeros(1,n);
tic
for ind = 1:n
ranksSingle(ind) = rank(magic(ind));
end
toc
matlabpool local 4
tic
ranks = zeros(1,n);
parfor (ind = 1:n)
ranks(ind) = rank(magic(ind));
end
toc
isequal(ranksSingle, ranks)
matlabpool close
I also tried it with matlabpool 2. As you can clearly see from the process window, all cores are busy to 100% when running the parallel computation (marked red).
When running the single-cpu computation (marked blue), strangly the 4 cores are also more busy than before. I would have expected only ONE core to go up. I searched the internet to see, if perhaps the magic() or rank function are built-in parallelized, but as you can read from here: http://www.walkingrandomly.com/?p=1894 it’s not the case. So it’s okay that those 4 cores are not fully busy, but still I’m wondering why ALL cores go up.
Secondly, I really wonder the computation time of the parallelized version. I know there’s some sort of overhead by distributing the jobs to the single cores, but this shouldn’t be so high that there’s no benefit at all in the end 🙁
Perhaps anybody can tell me something about it 🙁 I’m really stuck at this since I want to speed up some of my for-loops. Second question is, if there’s any command to always set the worker size to the number of physical cores I have in my computer? (and also using Hyper Threading if that’s an additional benefit?)
Thanks a lot!
When you want to run a parallel job, you should remember that it’s bad to have too many fast iterations, and that it’s bad to have too few slow iterations. If you do a million iterations that each take a few miliseconds, the overhead from parallelization will destroy any possible gain. If you do nine iterations that take an hour each, and you run it on eight processors in parallel, seven processors will be idling for an hour waiting for iteration #9 to finish.
Thus, your example is pretty bad for testing the impact of parallelization, because both
magicandrankare way too fast.Note that I was running a second parallel job at the same time, but roughly, the result should be reproducible: There is a bit of overhead (note that I didn’t count the time used by
matlabpool!), but the speed-up is there. You should be seeing the same amount of overhead if you increase the pause length. Also, you should be testing with your actual loops (try to parallelize the outermost loop, btw).To your second question:
Will create as many workers as there are physical cores. Hyperthreading will help you ensure that the computer remains responsive when the parallel job is running.
Finally, while
magicandrankmay not be fully multithreaded by themselves, they may make calls to multithreaded routines.