I was interested in doing a proyect about face-recognition (to make use of SIMD instructions set). But during the first semester of the current year, I learnt something about threads and I was wondering if I could combine them.
When should I avoid combining multithreading and SIMD instructions? When is it worth it to do it?
Saving x87/MMX/XMM/YMM registers can take quite some time and cause significant
cache thrash. Normally, saving and restoring of FP state is done in a lazy manner: upon a context switch, the kernel remembers the current thread as the “owner” of the FP state and sets the TS flag in CR0 and – this will cause a trap to the kernel whenever a thread attempts to execute an FP insn. The FP state of the old thread and the FP state of the currently executing thread are saved and restored, respectively, at that time.
Now, if for extended periods of time (several or many context switches) no other thread than yours uses FP insns – the lazy policy will cause no FP state to be saved/restored whatsoever and you won’t get performance hit.
Since we’re obviously talking about multiprocessor system, the threads, which execute your algorithm in parallel won’t conflict with each other because they should execute on their own CPU/core/HT and have a private set of registers.
tl;dr
You shouldn’t be concerned with the overhead of saving and restoring FP registers.