Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8330673
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T02:11:28+00:00 2026-06-09T02:11:28+00:00

I want to improve the performance of a specific method inside a larger application.

  • 0

I want to improve the performance of a specific method inside a larger application.

The goal is improving latency (wall-clock time spent in a specific function), not (neccessarily) system load.

Requirements:

  1. As I expect a lot of the latency to be due to I/O, take into account time spent waiting/blocked (in other words: look at wall clock time instead of CPU time)
  2. As the program does much more than the fragment i’m trying to optimize. There needs to be a way to either start/stop profiling programmatically, or to filter the output to only show the time between entering and exiting the function i’m optimizing.
  3. Profiling on the method level is acceptable (if it can be done on the instruction level, even better. if it only profiles system calls, that’s probably not enough)
  4. This is for a hobby project, so expensive tools are not really an option
  5. Instrumentation (-finstrument-functions) is acceptable
  6. The critical piece of code I’m interested in is hard to interrupt manually (because it is already relatively fast and hard to realistically invoke in a loop), so some kind of automation is necessary.

Tools discarded so far:

  • gprof, oprofile, callgrind (requirement 1)
  • buiding something custom using getrusage (requirement 1)
  • poormansprofiler.org (requirement 2)
  • strace -T,dtrace,http://perf.wiki.kernel.org (requirements 2 and 3)
  • VTune,Zoom (requirement 4)
  • manual call-stack sampling (requirement 6)
  • google-perftools (should be able to measure wall time, but this does not appear to work in my case, presumably because SIGALRM interference.
  • systemtap (my kernel isn’t patched to include utrace)

Other options which I haven’t further evaluated yet:

  • cprof (doesn’t build here out-of-the-box, seems i386-only)
  • manually inserting trace points (e.g. with lttng)

I’d love to hear about:

  • other options
  • perhaps I discarded some tool too soon?
  • whether or not the options I haven’t evaluated yet have a chance of working, and if so, how to best do it.

I finally settled for:

  • building something custom using -finstrument-functions myself, based on http://balau82.wordpress.com/2010/10/06/trace-and-profile-function-calls-with-gcc/

The trace produced by this crude tool is hard to interpret, and I can easily imagine some tools for further processing its output making it infinitely more useful. However, this did the job for me for now, so I’m putting that project off until later ;).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T02:11:29+00:00Added an answer on June 9, 2026 at 2:11 am

    Use this method.

    It is quite simple and effective at pinpointing opportunities for optimization, whether they are in CPU or IO bound code.

    If you are right that the biggest opportunities are in a particular function or module, then it will find them. If they are elsewhere, it will find them.

    Of the tools you mentioned and discarded, it is most similar to poor man’s profiler, but still not very similar.

    EDIT: Since you say it is triggered by a user interaction and blocks further input until it completes, here’s how I would do it.

    First, I assume it does not block a manual interrupt signal to the debugger, because otherwise you’d have no way to stop an infinite loop.
    Second, I would wrap a loop of 10, 100, or 1000 times around the routine in question, so it is doing it long enough to be manually interrupted.

    Now, suppose it is spending some fraction of time doing I/O, like 50%. Then when you interrupt it, you have a 50% chance of catching it in the I/O.
    So if you catch it in the I/O, which the call stack will tell you, you can also see in great detail where the I/O is being requested from and why.

    It will show you what’s going on, which is almost certainly something surprising.
    If you see it doing something on as few as two (2) samples that you could find a way to eliminate, then you will get a considerable speedup.
    In fact, if you eliminate that activity, you don’t know in advance how much time you will save, but on average you can expect to save fraction F = (s+1)/(n+2), where n is the total number of samples you took, and s is the number of samples that showed the activity.
    (Rule of Succession)
    Example, if you took 4 stack samples and saw the activity on 2 of them, on average it would save you F = 3/6 = 1/2, corresponding to a speedup factor of 1/(1-F) or 2.

    Once you’ve done that, you can do it again and find something else to fix.
    The speedup factors multiply together like compound interest.

    Then of course you remove the outer loop and “cash in” all the speedups you got.

    If you are wondering how this differs from profiling, it is that by carefully examining each stack sample, and possibly related data, you can recognize activities that you could remove, where if all you’ve got is measurements, you are left trying to intuit what is going on.
    The actual amount of time you save is what it is, regardless of any measurements.
    The important thing is to find the problem.
    No matter how precisely a profiler might measure it, if you can’t find it, you’re not winning.
    These pages are full of people saying either they don’t understand what their profiler is telling them, or it seems to be saying there is nothing to fix, which they are only too willing to accept.
    That’s a case of rose-tinted glasses.

    More on all that.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

To improve performance of our MVC3 application we want to mark some Controller's to
I am porting an existing application to C# and want to improve performance wherever
I want to improve the performance of my application, and found that it spends
I am working on image processing library in c#. want to improve performance/speed and
I have an website and I want to improve it's performance. I've fixed all
We have a C++ application for which we try to improve performance. We identified
We have a SQL Server 2005 database for which we want to improve performance
I want to improve performance of my app, and started reading http://guides.rubyonrails.org/performance_testing.html . My
I'm trying to improve performance of my script. I want to improve this: el.removeClass(
I want to improve the reading performance, so thinking about pre-load the data block

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.