Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 871415
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T10:38:15+00:00 2026-05-15T10:38:15+00:00

On SO, there are quite a few questions about performance profiling, but I don’t

  • 0

On SO, there are quite a few questions about performance profiling, but I don’t seem to find the whole picture. There are quite a few issues involved and most Q & A ignore all but a few at a time, or don’t justify their proposals.

What Im wondering about. If I have two functions that do the same thing, and Im curious about the difference in speed, does it make sense to test this without external tools, with timers, or will this compiled in testing affect the results to much?

I ask this because if it is sensible, as a C++ programmer, I want to know how it should best be done, as they are much simpler than using external tools. If it makes sense, lets proceed with all the possible pitfalls:

Consider this example. The following code shows 2 ways of doing the same thing:

#include <algorithm>
#include <ctime>
#include <iostream>

typedef unsigned char byte;

inline
void
swapBytes( void* in, size_t n )
{
   for( size_t lo=0, hi=n-1; hi>lo; ++lo, --hi )

      in[lo] ^= in[hi]
   ,  in[hi] ^= in[lo]
   ,  in[lo] ^= in[hi] ;
}

int
main()
{
         byte    arr[9]     = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' };
   const int     iterations = 100000000;
         clock_t begin      = clock();

   for( int i=iterations; i!=0; --i ) 

      swapBytes( arr, 8 );

   clock_t middle = clock();

   for( int i=iterations; i!=0; --i ) 

      std::reverse( arr, arr+8 );

   clock_t end = clock();

   double secSwap = (double) ( middle-begin ) / CLOCKS_PER_SEC;
   double secReve = (double) ( end-middle   ) / CLOCKS_PER_SEC;


   std::cout << "swapBytes,    for:    "   << iterations << " times takes: " << middle-begin
             << " clock ticks, which is: " << secSwap    << "sec."           << std::endl;

   std::cout << "std::reverse, for:    "   << iterations << " times takes: " << end-middle
             << " clock ticks, which is: " << secReve    << "sec."           << std::endl;

   std::cin.get();
   return 0;
}

// Output:

// Release:
//  swapBytes,    for: 100000000 times takes: 3000 clock ticks, which is: 3sec.
//  std::reverse, for: 100000000 times takes: 1437 clock ticks, which is: 1.437sec.

// Debug:
//  swapBytes,    for: 10000000 times takes: 1781  clock ticks, which is: 1.781sec.
//  std::reverse, for: 10000000 times takes: 12781 clock ticks, which is: 12.781sec.

The issues:

  1. Which timers to use and how get the cpu time actually consumed by the code under question?
  2. What are the effects of compiler optimization (since these functions just swap bytes back and forth, the most efficient thing is obviously to do nothing at all)?
  3. Considering the results presented here, do you think they are accurate (I can assure you that multiple runs give very similar results)? If yes, can you explain how std::reverse gets to be so fast, considering the simplicity of the custom function. I don’t have the source code from the vc++ version that I used for this test, but here is the implementation from GNU. It boils down to the function iter_swap, which is completely incomprehensible for me. Would this also be expected to run twice as fast as that custom function, and if so, why?

Contemplations:

  1. It seems two high precision timers are being proposed: clock() and QueryPerformanceCounter (on windows). Obviously we would like to measure the cpu time of our code and not the real time, but as far as I understand, these functions don’t give that functionality, so other processes on the system would interfere with measurements. This page on the gnu c library seems to contradict that, but when I put a breakpoint in vc++, the debugged process gets a lot of clock ticks even though it was suspended (I have not tested under gnu). Am I missing alternative counters for this, or do we need at least special libraries or classes for this? If not, is clock good enough in this example or would there be a reason to use the QueryPerformanceCounter?

  2. What can we know for certain without debugging, dissassembling and profiling tools? Is anything actually happening? Is the function call being inlined or not? When checking in the debugger, the bytes do actually get swapped, but I’d rather know from theory why, than from testing.

Thanks for any directions.

update

Thanks to a hint from tojas the swapBytes function now runs as fast as the std::reverse. I had failed to realize that the temporary copy in case of a byte must be only a register, and thus is very fast. Elegance can blind you.

inline
void
swapBytes( byte* in, size_t n )
{
   byte t;

   for( int i=0; i<7-i; ++i )
    {
        t       = in[i];
        in[i]   = in[7-i];
        in[7-i] = t;
    }
}

Thanks to a tip from ChrisW I have found that on windows you can get the actual cpu time consumed by a (read:your) process trough Windows Management Instrumentation. This definitely looks more interesting than the high precision counter.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T10:38:16+00:00Added an answer on May 15, 2026 at 10:38 am

    Obviously we would like to measure the cpu time of our code and not the real time, but as far as I understand, these functions don’t give that functionality, so other processes on the system would interfere with measurements.

    I do two things, to ensure that wall-clock time and CPU time are approximately the same thing:

    • Test for a significant length of time, i.e. several seconds (e.g. by testing a loop of however many thousands of iterations)

    • Test when the machine is more or less relatively idle except for whatever I’m testing.

    Alternatively if you want to measure only/more exactly the CPU time per thread, that’s available as a performance counter (see e.g. perfmon.exe).

    What can we know for certain without debugging, dissassembling and profiling tools?

    Nearly nothing (except that I/O tends to be relatively slow).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.