I have a native multithreaded Win32 application written in C++ which has about 3 relatively busy threads and 4 to 6 threads that don’t do that much. When it runs in a normal mode total CPU usage adds up to about 15% on an 8-core machine and the application finished in about 30 seconds. And when I restrict the application to only one core by setting the affinity mask to 0x01 it completes faster, in 23 seconds.
I’m guessing it has something to do with the synchronization being cheaper when restricted to one physical core and/or some concurrent memory access issues.
I’m running Windows 7 x64, application is 32-bit. The CPU is Xeon X5570 with 4 cores and HT enabled.
Could anyone explain that behavior in detail? Why that happens and how to predict that kind of behavior ahead of time?
Update: I guess my question wasn’t very clear. I would like to know why it gets faster on one physical core, not why it doesn’t get above 15% on multiple cores.
Without stating the application it is difficult to just guess what is causing the slow running of the application. If you want to go for a detailed analysis, we can consider following factors –
InterProcessor Communication : How much the threads in your application communicate with each other. If they communicate very often, then you will have overhead due to this behavior
Processor Cache Architecture : This is another important factor to see. You should know how the caches of the processor are going to be affected due to threads running on different processor. How much thrashing is going to happen at shared caches.
Page Faults : Maybe running on single processor is causing less number of page faults due to sequential nature of your program?
Locks : Lock overheads in your code? This should not cause a slowdown. But in addition to the above mentioned factors, this might add up to some overhead.
NoC on the processor : Definitely, if you allocate different threads to different processor cores, and they are communicating, then you need to know what is the path they are taking. Is there a dedicated connection between them? Perhaps you should have a look at this link.
Processor Load : Last but not the least is that, I hope you are not having other tasks running on other processor cores, causing a lot of context-switches. Context switch is typically very expensive.
Temperature : One effect you should consider is of the processor clock being slowed down if the cpu core is heating up. I think, you will not have this effect, but it also largely depends on the ambient temperature.