I wonder whether more than 8 threads can run concurrently on a hardware with 8 cores.
If so, using openMP to parallelize N calculations, I could create chunks of size, say, N/8, and in each thread further fork into (N/8)/8 threads, and maybe still more?
How do things happen when I nested parallelize? do I still have 8 available threads for the nested parallel?
Thanks!!
8 cores can only run at most 8 threads concurrently at a given point in time. However, a lot depends on what your threads are doing. If they are doing CPU intensive tasks, it is not recommended to spawn many more threads than the number of cores (a few maybe OK). Otherwise excessive context switching and cache misses will start to degrade performance. However, if there is significant I/O, the threads may be blocked a lot, not using the CPU, so you can run many more of them in parallel.
Bottom line is, you need to measure the performance in your particular case, on your particular environment.
See also this related thread.