Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6891581
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T06:25:41+00:00 2026-05-27T06:25:41+00:00

I am optimizing a c++ code. at one critical step, I want to implement

  • 0

I am optimizing a c++ code.
at one critical step, I want to implement the following function y=f(x):

f(0)=1

f(1)=2

f(2)=3

f(3)=0

which one is faster ? using a lookup table or i=(i+1)&3 or i=(i+1)%4 ? or any better suggestion?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T06:25:42+00:00Added an answer on May 27, 2026 at 6:25 am

    Almost certainly the lookup table is going to be slowest. In a lot of cases, the compiler will generate the same assembly for (i+1)&3 and (i+1)%4; however depending on the type/signedness of i, they may not be strictly equivalent and the compiler won’t be able to make that optimization. For example for the code

    int foo(int i)
    {
        return (i+1)%4;
    }
    
    unsigned bar(unsigned i)
    {
        return (i+1)%4;
    }
    

    on my system, gcc -O2 generates:

    0000000000000000 <foo>:
       0:   8d 47 01                lea    0x1(%rdi),%eax
       3:   89 c2                   mov    %eax,%edx
       5:   c1 fa 1f                sar    $0x1f,%edx
       8:   c1 ea 1e                shr    $0x1e,%edx
       b:   01 d0                   add    %edx,%eax
       d:   83 e0 03                and    $0x3,%eax
      10:   29 d0                   sub    %edx,%eax
      12:   c3                      retq   
    
    0000000000000020 <bar>:
      20:   8d 47 01                lea    0x1(%rdi),%eax
      23:   83 e0 03                and    $0x3,%eax
      26:   c3                      retq
    

    so as you can see because of the rules about signed modulus results, (i+1)%4 generates a lot more code in the first place.

    Bottom line, you’re probably best off using the (i+1)&3 version if that expresses what you want, because there’s less chance for the compiler to do something you don’t expect.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need help optimizing the code to run faster, unless it is optimized the
following the text book, I do measure performance whenever I try optimizing my code.
Which compiles to faster code: ans = n * 3 or ans = n+(n*2)?
I am optimizing some code for an Intel x86 Nehalem micro-architecture using SSE intrinsics.
I'm optimizing some frequently run Perl code (once per day per file). Do comments
I use MSVC++ 2005 x64. Some code work incorrectly in release optimizing mode. So,
The Situation: I'm optimizing a pure-java implementation of the LZF compression algorithm, which involves
I am optimizing the startup of a WinForms app. One issue I identified is
This is code I'm using to test a webserver on an embedded product that
Spoiler alert: this is related to Problem 14 from Project Euler. The following code

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.