Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6352541
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T22:13:34+00:00 2026-05-24T22:13:34+00:00

I have a C program that has n multiplications (single multiplication with n iterations)

  • 0

I have a C program that has n multiplications (single multiplication with n iterations) and I found another logic that has n/2 iterations of (1 multiplication + 2 additions). I know about the complexity that both are of O(n). but in terms of CPU cycles. which is faster ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T22:13:35+00:00Added an answer on May 24, 2026 at 10:13 pm

    First of all follow Dietrich Epp’s first advice – measuring is (at least for complex optimization problems) the only way to be sure.

    Now if you want to figure out why one is faster than the other, we can try. There are two different important performance measures: Latency and reciprocal throughput. A short summary of the two:

    Latency: This is the delay that the instruction generates in a
    dependency chain. The numbers are minimum values. Cache misses,
    misalignment, and exceptions may increase the clock counts
    considerably. Where hyperthreading is enabled, the use of the same
    execution units in the other thread leads to inferior performance.
    Denormal numbers, NAN’s and infinity do not increase the latency. The
    time unit used is core clock cycles, not the reference clock cycles
    given by the time stamp counter.

    Reciprocal throughput: The average number of core clock cycles per
    instruction for a series of independent instructions of the same kind
    in the same thread.

    For Sandy bridge the rec. throughput for an add r, r/i (for further notice r=register, i=immediate, m=memory) is 0.33 while the latency is 1.

    An imul r, r has a latency of 3 and a rec. throughput of 1.

    So as you see it completely depends on your specific algorithm – if you can just replace one imul with two independent adds this particular part of your algorithm could get a theoretical speedup of 50% (and in the best case obviously a speedup of ~350%). But on the other hand if your adds add a problematic dependency one imul could be just as fast as one add.

    Also note that we’ve ignored all the additional complications like memory and cache behavior (things which will generally have a much, MUCH larger influence on the execution time) or intricate stuff like µop fusion and whatnot. In general the only people that should care about this stuff are compiler writers – it’s much simpler to just measure the result of their efforts 😉

    Anyways if you want a good listing of this stuff see this here (the above description of latency/rec. throughput is also from that particular document).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a PHP program that has been written keeping in mind a single
I have a C++ program that has a pretty terrible memory leak, about 4MB
I have program that has a variable that should never change. However, somehow, it
This is my problem. I have a program that has to run in a
I have a program in Octave that has a loop - running a function
I have a program that (amongst other things) has a command line interface that
Is tight looping in a program bad? I have an application that has two
I have a C program which has a function call that is defined in
I have an application that seems to throw exceptions only after the program has
I have a program that has this code : #include<stdio.h> main(){ int input; char

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.