I have a C program that has n multiplications (single multiplication with n iterations)

Question

0

Asked: May 24, 20262026-05-24T22:13:34+00:00 2026-05-24T22:13:34+00:00

I have a C program that has n multiplications (single multiplication with n iterations)

0

I have a C program that has n multiplications (single multiplication with n iterations) and I found another logic that has n/2 iterations of (1 multiplication + 2 additions). I know about the complexity that both are of O(n). but in terms of CPU cycles. which is faster ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T22:13:35+00:00

First of all follow Dietrich Epp’s first advice – measuring is (at least for complex optimization problems) the only way to be sure.

Now if you want to figure out why one is faster than the other, we can try. There are two different important performance measures: Latency and reciprocal throughput. A short summary of the two:

Latency: This is the delay that the instruction generates in a
dependency chain. The numbers are minimum values. Cache misses,
misalignment, and exceptions may increase the clock counts
considerably. Where hyperthreading is enabled, the use of the same
execution units in the other thread leads to inferior performance.
Denormal numbers, NAN’s and infinity do not increase the latency. The
time unit used is core clock cycles, not the reference clock cycles
given by the time stamp counter.

Reciprocal throughput: The average number of core clock cycles per
instruction for a series of independent instructions of the same kind
in the same thread.

For Sandy bridge the rec. throughput for an add r, r/i (for further notice r=register, i=immediate, m=memory) is 0.33 while the latency is 1.

An imul r, r has a latency of 3 and a rec. throughput of 1.

So as you see it completely depends on your specific algorithm – if you can just replace one imul with two independent adds this particular part of your algorithm could get a theoretical speedup of 50% (and in the best case obviously a speedup of ~350%). But on the other hand if your adds add a problematic dependency one imul could be just as fast as one add.

Also note that we’ve ignored all the additional complications like memory and cache behavior (things which will generally have a much, MUCH larger influence on the execution time) or intricate stuff like µop fusion and whatnot. In general the only people that should care about this stuff are compiler writers – it’s much simpler to just measure the result of their efforts 😉

Anyways if you want a good listing of this stuff see this here (the above description of latency/rec. throughput is also from that particular document).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a C program that has n multiplications (single multiplication with n iterations)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply