I wanna know why a context switch is slow compared to asynchronous operations on the same thread.
Why is better to run N threads (with N equals to the number of cores), each one processing M clients assynchronously, instead of running M threads? I’ve told the reason is the context switch overhead, but I can’t find how slow are context switchs.
Just to clarify I will assume that when you say “instead of running M threads” you mean N*M threads (if you run M threads, each one will need to process N clients in order to match the same number of total clients and this will be a similar case).
So the difference between N threads running in N cores, each one processing M clients, and N*M threads running in the same number of cores it is that in the first case you won’t have to create new threads and, as you said, you won’t have context switching. This is an advantage because the work needed to create OS threads is heavy; it needs to create a different process space, a new stack, etc. Besides, if you have more threads the OS scheduler will be stopping and activating the running processes, which it is also time-consuming. Every time the scheduler change the process assigned to a core it will probably also need to cache the context of this process, adding a lot of cache-misses and consequently more time.
On the other hand, if you have a fixed number of thread, equals to the number of cores (sometimes even N-1 is suggested) you can manage the “tasks” or clients in a user-level scheduler which may incur in a few more computations of your program but avoid a lot of OS processes and memory management, making the overall execution faster. Some current parallel APIs such as .Net Task Parallel Library (TPL), OpenMP, Intel’s Threading Building Blocks, or Cilk embody this model of parallelism called dynamic multithreading.