I am using google’s perftools (http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html) for CPU profiling—it’s a wonderful tool that has helped me perform a great deal of CPU-time improvements on my application.
Unfortunately, I have gotten to the point that the code is still a bit slow, and when compiled using g++’s -O3 optimization level, all I know is that a specific function is slow, but not which aspects of it are slow.
If I remove the -O3 flag, then the unoptimized portions of the program overtake this function, and I don’t get a lot of clarity into the actual parts of the function that are slow. If I leave the -O3 flag in, then the slow parts of the function are inlined, and I can’t determine which parts of the function are slow.
Any suggestions? Thanks for your help!
If you’re on linux, use oprofile. If you’re on Windows, use AMD’s CodeAnalyst.
Both will give sample-based profiles down to the level of individual source lines or assembly instructions and you should have no problem identifying ‘hot spots’ within functions.