I have a Perl script that takes about 30 minutes to run, so of course I run Devel::NYTProf. Great profiler. For many of my subs, I’m getting some data that doesn’t make sense to me.
I’m running with perl 5.10.0 on Linux using the default NYTProf settings.
In the HTML output, each of the subs has a summary section stating how much time is spent in the sub and its children and then goes on to give me line information.
The line statistics don’t add up to the total spent in the function. What gives?
For example, I have a function that’s reported to use 233s (57+166). The line-by-line number report has one line that uses 20s, another that uses 4 and one that uses 2. The other lines are <1s and the function is not that long.
What can I do to resolve this mismatch?
I could move to Perl 5.12 but that would take some work to install the dependencies. I’m happy to run it in a slower mode. Is there a way to increase the sampling frequency? Run on a slower machine?
Click here for a sample: my NYTProf output. In this case, the sub is reported to use 225 seconds, but adding all of the numbers yields 56 seconds. This run had optimization turned off:
setenv NYTPROF optimize=0:file=nytprof.optout
Update I’ve rerun with Perl 5.12 using the findcaller=1 option flag as suggested with more or less the same results. (I ran on a different dataset)
Update Tim B is right. I have changed some of my key subs to do caching themselves instead of using memoize and the NYTProf results are useful again. Thank you Tim.
I’ve just added this to the NYTProf documentation:
That probably explains the difference between the sum of the statement time column (31.7s) and the exclusive time reported for the subroutine (57.2s). The difference amounts to approximately 100 microseconds per call (which seems a little high, but not unreasonably so).
The statement profiler keeps track of how much time was spent on overheads, like writing statement profile data to disk. The subroutine profiler subtracts the difference in overheads between entering and leaving the subroutine in order to give a more accurate profile.
The statement profiler is generally very fast because most writes get buffered for zip compression so the profiler overhead per statement tends to be very small, often a single ‘tick’. The result is that the accumulated overhead is quite noisy. This becomes more significant for subroutines that are called frequently and are also fast (in this case 250303 calls at 899µs/call). So I suspect this is another, smaller, contribution to the discrepancy between statement time and exclusive times.
More importantly, I’ve also added this section:
The Memoize module is primary the cause of the discrepancy in your report. The calls to
Memoize::__ANON__[...]execute a sub generated by Memoize that looks likesub { unshift @_, $cref; goto &_memoizer; }. Thatgoto &_memoizeris implemented by perl as a kind of return to the caller followed by a call to the specified sub, and that’s the way NYTProf profiles it.The confusion is caused by the fact that, although
add_bit_to_mapis being recorded as the caller of_memoizerso the time in the call gets added toadd_bit_to_map, the file and line number location of the call is recorded as the location of thegoto.It may be possible to improve this in a future release.
Thank you for prompting me to investigate this and improve the documentation.
Tim Bunce.
p.s. I recommend asking questions about NYTProf on the mailing list.