We know that modern processors execute instructions such as cosine and sin directly on the processor as they have opcodes for it. My question is how much cycles these instructions normally take. Do they take constant time or depend upon input parameters?
We know that modern processors execute instructions such as cosine and sin directly on
Share
Talking about “cycles for an instruction” for modern processors got to be difficult quite a while ago. Processors these days contain multiple execution cores, their operation can overlap and can execute out-of-order.
A good example of the essential consideration is given in the Intel processor manual, volume 4, appendix C. It breaks down instruction timing by Latency and Throughput. Latency is the number of cycles an execution core requires to complete a micro-op. Throughput is the number of cycles required to have the execution unit accept the same instruction again. Throughput is generally lower than Latency, including having fractional values in the table. A side-effect of having more than one execution unit of the same type. The type is important, that tells you whether instructions can overlap.
Maybe you got the essential message here: it greatly depends what other instructions surround the code you are interested in timing. Those other instructions may well execute concurrently with the expensive one. At which point they take, effectively, 0 cycles. Or they may not, stalling the pipeline because the execution unit is busy with a previous instruction. The kind of details that programmers that write code optimizers care a lot about.
Some sample data from the manual, picking the most modern core in the tables:
A much better bang on SIMD instructions.
The only meaningful thing to do is measure, not assume.