So I have some computationally heavy code that can happily chug along using 32 threads or more efficiently. I also have access to a cluster through my school that has 32 processor cores. “Sweet!” you might say.
But, alas, the code gets very unhappy if you don’t have each thread paired in a dedicated way to a processor. For the numerically inclined, my BLAS implementation takes major hits to efficiency if the thread it’s running on gets swapped out for another. This wouldn’t be an issue, except that the cluster is in a state of anarchy.
There’s no job scheduler or queue, and the cluster won’t stop people from launching jobs, even if all of the resources are already spoken for.
Here’s my question: When I log on and want to run my code, I see four people already on. They might be running some combination of serial and parallel jobs. I’d like to use as many threads as I can efficiently (that is to say all of the remaining resources on the cluster). How can I determine some measure of what I can use, either the number of threads in use by all the other users, the peak computational load they’re generating, or some other quickly available measure?
Currently, my solution is to run a section of my code with 1 thread then 2 threads then …. then 32 threads and see where I hit a wall in real execution time. I back away from the wall by a couple of threads and then run a large job. This takes a few minutes and I’d like a faster way to find the appropriate number of threads to launch.
Thanks,
–Andrew
EDITS:
sehe’s answer definitely answered my original question about how to get the thread numbers. Turns out this is not as useful to have as I thought it would be. The comment from Phil put me onto the productive path. What I’m doing now is:
top -bn1 | grep load
to get the info. This is of advantage to me since my app is Matlab based and I can issue a UNIX call from the script to get this info every once in a while and adapt the number of threads I’m using to the available resources.
Thanks for putting me on the right track everyone.
–Andrew
If you have enough permission, pin the thread to a core (thread affinity); otherwise
to show all threads owned per user (assuming linux); the following to limit to running threads:
You could run this (as a script) using
watch(1):and have a graphical display of threads running.
Tidied perl: