I am optimizing a c++ code.
at one critical step, I want to implement the following function y=f(x):
f(0)=1
f(1)=2
f(2)=3
f(3)=0
which one is faster ? using a lookup table or i=(i+1)&3 or i=(i+1)%4 ? or any better suggestion?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Almost certainly the lookup table is going to be slowest. In a lot of cases, the compiler will generate the same assembly for
(i+1)&3and(i+1)%4; however depending on the type/signedness of i, they may not be strictly equivalent and the compiler won’t be able to make that optimization. For example for the codeon my system,
gcc -O2generates:so as you can see because of the rules about signed modulus results,
(i+1)%4generates a lot more code in the first place.Bottom line, you’re probably best off using the
(i+1)&3version if that expresses what you want, because there’s less chance for the compiler to do something you don’t expect.