Is using SSE2 intrinsic in the parallel_for a good idea ?
Since the number of SSE2 registers is limited, will it give rise to penalty in terms of performance ?
Does each CPU die have its own SSE2 registers ?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Is using SSE2 intrinsic in the parallel_for a good idea ?
That depends. It definitely is not a bad idea. You should profile your code, and use intrinsics where performance matters most.
Since the number of SSE2 registers is limited, will it give rise to penalty in terms of performance ?
If you are concerned with register pressure then you don’t have to worry about that. The compiler does the register allocation for you when you use intrinsics (unlike writing assembly). Code which is hand-written in intrinsics, usually is more compact than code compiled from a high level language. You should profile your code after each change you make to see if the performance has improved.
Does each CPU die have its own SSE2 registers ?
Each logical CPU has its own 8 (in 32-bit mode) or 16 (in 64-bit mode) XMM registers. In modern CPUs, each core is a logical CPU, or even two logical CPUs if you have hyper-threading enabled.