I need to profile a program to see whether any changes need to be made regarding performance. I suspect there is a need, but measuring first is the way to go. This is not that program, but it illustrates the problem I’m having:
#include <stdio.h> int main (int argc, char** argv) { FILE* fp = fopen ('trivial.c', 'r'); if (fp) { char line[80]; while (fgets (line, 80, fp)) printf (line); fclose (fp); } return 0; }
Here’s what I did with it:
% gcc trivial.c -pg -o trivial % ./trivial ... % gprof trivial gmon.out
Granted, this is a trivial program, but I would have thought it would make some kind of blip on the profiling radar. It didn’t:
called/total parents index %time self descendents called+self name index called/total children 0.00 0.00 1/1 __start [1704] [105] 0.0 0.00 0.00 1 _main [105] ----------------------------------------------- % cumulative self self total time seconds seconds calls ms/call ms/call name 0.0 0.00 0.00 1 0.00 0.00 _main [105] Index by function name [105] _main
Can anyone guide me here? I would like the output to reflect that it called fgets and printf at least 14 times, and it did hit the disk after all – there should be some measured time, surely.
When I run the same command on the real program, I get more functions listed, but even then it is not a complete list – just a sample.
Perhaps gprof is not the right tool to use. What is?
This is on OS X Leopard.
Edit: I ran the real program and got this:
% time real_program real 4m24.107s user 2m34.630s sys 0m38.716s
There are certain commonly-accepted beliefs in this business, that I would suggest you examine closely.
One is that the best (if not only) way to find performance problems is to measure the time each subroutine takes and count how many times it is called.
That is top-down. It stems from a belief that the forest is more important than the trees. It is based on myths about ‘speed of code’ and ‘bottlenecks’. It is not very scientific.
A performance problem is more like a bug than a quantitative thing. What it is doing wrong is it is wasting time, and it needs to be fixed. It is based on a simple observation:
Slowness consists of time being spent for poor reasons.
To find it, sample the program state at random slivers of clock time, and investigate their reasons.
If something is causing slowness, then that fact alone exposes it to your samples. So if you take enough samples, you will see it. You will know approximately how much time it is costing you, by the fraction of samples that show it.
A good way to tell if a sliver of time is being spent for a good reason is to look carefully at the call stack. Every function invocation on the stack has an implicit reason, and if any of those reasons are poor, then the reason for the entire sample is poor.
Some profilers tell you, at the statement level, what each statement is costing you.
Personally, I just randomly halt the program several times. Any invocations showing up on multiple samples are likely candidates for suspicion. It never fails.
You may say ‘It’s not accurate.’ It’s extremely accurate. It precisely pinpoints the instructions causing the problem. It doesn’t give you 3 decimal places of timing accuracy. I.e. it is lousy for measurement, but superb for diagnosis.
You may say ‘What about recursion?’. Well, what about it?
You may say ‘I think that could only work on toy programs.’ That would be just wishing. In fact large programs tend to have more performance problems, because they have deeper stacks, thus more opportunity for invocations with poor reasons, and sampling finds them just fine, thanks.
Sorry to be a curmudgeon. I just hate to see myths in what should be a scientifically-based field.
MORE