I’m writing a highly parallel application that’s multithreaded. I’ve already got an SSE accelerated thread class written. If I were to write an MMX accelerated thread class, then run both at the same time (one SSE thread and one MMX thread per core) would the performance improve noticeably?
I would think that this setup would help hide memory latency, but I’d like to be sure before I start pouring time into it.
The SSE and MMX instruction sets share the same set of vector processing execution units in the CPU. Therefore, running an SSE thread and an MMX thread will have the same resources available each thread as if running two SSE threads (or two MMX threads). The only difference is in instructions which exist in SSE but not MMX (since SSE is an extension of MMX). But in that case the MMX is probably going to be slower because it doesn’t have those more advanced instructions available to it.
So the answer is: No, you would not see a performance improvement compared to running two SSE threads.