Some very expencied programmer from another company told me about some low-level code-optimzation tips that targetting specific CPU, including pipeline-optimzation, which means, arrange the code (inlined assembly, obviously) in special orders such that it fit the pipeline better for the targetting hardware.
With the presence of out-of-order and speculative execuation, I just wonder is there any points to do this kind of low-level stuff? We are mostly invovled in high performance computing, so we can really focus on one very specific CPU type to do our optimzation, but I just dont know if there is any point to do this specific optimzation, anyone has any experience here, where to begin? are there any code examples for this kind of optimzation? many thanks!
I’ll start by saying that the compiler will usually optimize code sufficiently (i.e. well enough) that you do not need to worry about this provided your high-level code and algorithms are optimized. In general, manual optimizing should only happen if you have hard evidence that there is an actual performance issue that you can quantify and have tracked down.
Now, with that said, it’s always possible to improve things – sometimes a little, sometimes a lot.
If you are in the high-performance computing game, then this sort of optimization might make sense. There are all sorts of “tricks” that can be done, but they are best left to real experts and not for the faint of heart.
If you really want to know more about this topic, a good place to start is by reading Agner Fog’s website.