I am learning assembly and making some inlining in my Digital Mars C++ compiler. I searched some things to make a program better and had these parameters to tune the programs:
use better C++ compiler//thinking of GCC or intel compiler
use assembly only in critical part of program
find better algorithm
Cache miss, cache contention.
Loop-carried dependency chain.
Instruction fetching time.
Instruction decoding time.
Instruction retirement.
Register read stalls.
Execution port throughput.
Execution unit throughput.
Suboptimal reordering and scheduling of micro-ops.
Branch misprediction.
Floating point exception.
I understood all except “register read stalls”.
Question: Can anybody tell me how is this happening in CPU and the “superscalar” form of the “out of order execution”?
Normal “out of order” seemed logical but i couldnt find a logical explanation of “superscalar” form.
Question 2: Can someone alse give some good instruction list of SSE SSE2 and newer CPU’s prefarably with micro-ops table, port throughputs, units and some calculation table for the latencies to find the real bottle-neck of a piece of code?
I would be happy with a small example like this:
//loop carried dependency chain breaking:
__asm
{
loop_begin:
....
....
sub edx,05h //rather than taking i*5 in each iteration, we sub 5 each iteration
sub ecx,01h //i-- counter
...
...
jnz loop_begin//edit: sub ecx must have been after the sub edx for jnz
}
//while sub edx makes us get rid of a multiplication also makes that independent of ecx, making independent
Thank you.
Computer: Pentium-M 2GHz , Windows XP-32 bit
You should take a look at Agner Fogs optimization manuals: Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms or Optimizing subroutines in assembly language: An optimization guide for x86 platforms.
But to really be able to outsmart a modern compiler, you need some good background knowledge of the arch you want to optimize for: The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers