Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6653853
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T01:19:12+00:00 2026-05-26T01:19:12+00:00

The following three pieces of code achieves exactly the same effect. Yet, when compiled

  • 0

The following three pieces of code achieves exactly the same effect. Yet, when compiled with -O3 on GCC 4.5.2 the times for a lot of iterations vary quite markedly.

1 – Normal branching, using multiple conditions, best time 1.0:

// a, b, c, d are set to random values 0-255 before each iteration.
if (a < 16 or b < 32 or c < 64 or d < 128) result += a+b+c+d;

2 – Branching, manually using bitwise or to check conditions, best time 0.92:

if (a < 16 | b < 32 | c < 64 | d < 128) result += a+b+c+d;

3 – Finally, getting the same result without a branch, best time 0.85:

result += (a+b+c+d) * (a < 16 | b < 32 | c < 64 | d < 128);

The above times are the best for each method when run as the the inner loop of a benchmark program I made. The random() is seeded the same way before each run.

Before I made this benchmark I assumed GCC would optimize away the differences. Especially the 2nd example makes me scratch my head. Can anyone explain why GCC doesn’t turn code like this into equivalent faster code?

EDIT: Fixed some errors, and also made it clear that the random numbers are created regardless, and used, so as to not be optimized away. They always were in the original benchmark, I just botched the code I put on here.

Here is an example of an actual benchmark function:

boost::random::mt19937 rng;
boost::random::uniform_int_distribution<> ranchar(0, 255);

double quadruple_or(uint64_t runs) {
  uint64_t result = 0;
  rng.seed(0);

  boost::chrono::high_resolution_clock::time_point start = 
    boost::chrono::high_resolution_clock::now();
  for (; runs; runs--) {
    int a = ranchar(rng);
    int b = ranchar(rng);
    int c = ranchar(rng);
    int d = ranchar(rng);
    if (a < 16 or b < 32 or c < 64 or d < 128) result += a;
    if (d > 16 or c > 32 or b > 64 or a > 128) result += b;
    if (a < 96 or b < 53 or c < 199 or d < 177) result += c;
    if (d > 66 or c > 35 or b > 99 or a > 77) result += d;
  }

  // Force gcc to not optimize away result.
  std::cout << "Result check " << result << std::endl;
  boost::chrono::duration<double> sec = 
    boost::chrono::high_resolution_clock::now() - start;
  return sec.count();
}

The full benchmark can be found here.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T01:19:13+00:00Added an answer on May 26, 2026 at 1:19 am

    The OP has changed a bit since my original answer. Let me try to revisit here.

    In case 1, because of or short-circuiting I would expect the compiler to generate four compare-then-branch code sections. Branches can obviously be pretty expensive especially if they don’t go the predicted path.

    In case 2, the compiler can decide to do all four comparisons, convert them to bool 0/1 results, and then bitwise or all four pieces together, then doing a single (additional) branch. This trades possibly more comparisons for possibly fewer branches. It appears as if reducing the number of branches does improve performance.

    In case 3, things work pretty much the same as 2 except at the very end one more branch may be eliminated by explicitly telling the compiler “I know the result will be zero or one, so just multiply the thing on the left by that value”. The multiply apparently comes out faster than the corresponding branch on your hardware. This is in contrast to the second example where the compiler doesn’t know the range of possible outputs from the bitwise or so it has to assume it could be any integer and must do a compare-and-jump instead.

    Original answer for history:
    The first case is functionally different from the second and third if random has side effects (which a normal PRNG would), so it stands to reason that the compiler may optimize them differently. Specifically, the first case will only call random as many times as needed to pass the check while in the other two cases random will always be called four times. This will (assuming random really is stateful) result in the future random numbers being different.

    The difference between the second and third is because the compiler probably can’t figure out for some reason that the result of the bitwise or will always be 0 or 1. When you give it a hint to do the multiplication instead of branching the multiplication probably comes out faster due to pipelining.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

The following three pieces of code behave exactly the same: <p {padding: 0 15
My question is are the following two pieces of code run the same by
i have the following two pieces of code which i think should be identical
Is there a performance difference between the following two pieces of code? if (myCondition)
Please consider the following piece of code, which throws three different exceptions (namely, System.Configuration.ConfigurationErrorsException
We have several projects in development sharing the same codebase. Certain pieces of code
Is there any difference between the following pieces of code in terms of processor
Consider the following three lines of Mathematica code and note that input line 1
I have the following two pieces of code: public class C { public void
What's the difference between the following two pieces of code? Version B seems harder

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.