Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8624213
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T07:28:08+00:00 2026-06-12T07:28:08+00:00

I am working on a (quite large) existing monothreaded C application. In this context

  • 0

I am working on a (quite large) existing monothreaded C application. In this context I modified the application to perform some very few additional work consisting in incrementing a counter each time we call a special function (this function is called ~ 80.000 times). The application is compiled on an Ubuntu 12.04 running a 64 bits Linux kernel 3.2.0-31-generic with -O3 option.

Surprisingly the instrumented version of the code is running faster and I am investigating why.I measure execution time with clock_gettime(CLOCK_PROCESS_CPUTIME_ID) and to get representative results, I am reporting an average execution time value over 100 runs. Moreover, to avoid interference from outside world, I tried as much as possible to launch the application in a system without any other applications running (on a side note, because CLOCK_PROCESS_CPUTIME_ID returns process time and not wall clock time, other applications “should” in theory only affect cache and not directly the process execution time)

I was suspecting “instruction cache effects”, maybe the instrumented code that is a little bit larger (few bytes) fits differently and better in the cache, is this hypothesis conceivable ? I tried to do some cache investigations with valegrind –tool=cachegrind but unfortunately, the instrumented version has (as it seems logical) more cache misses than the initial version.

Any hints on this subject and ideas that may help to find why instrumented code is running faster are welcomes (some GCC optimizations available in one case and not in the other, why ?, …)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T07:28:09+00:00Added an answer on June 12, 2026 at 7:28 am

    Since there are not many details in the question, I can only recommend some factors to consider while investigating the problem.

    Very few additional work (such as incrementing a counter) might alter compiler’s decision on whether to apply some optimizations or not. Compiler has not always enough information to make perfect choice. It may try to optimize for speed where bottleneck is code size. It may try to auto-vectorize computations when there is not too much data to process. Compiler may not know what kind of data is to be processed or what is the exact model of CPU, that will execute the code.

    1. Incrementing a counter may increase size of some loop and prevent loop unrolling. This may decrease code size (and improve code locality, which is good for instruction or microcode caches or for loop buffer and allows CPU to fetch/decode instructions quickly).
    2. Incrementing a counter may increase size of some function and prevent inlining. This also may decrease code size.
    3. Incrementing a counter may prevent auto-vectorization, which again may decrease code size.

    Even if this change does not affect compiler optimization, it may alter the way how the code is executed by CPU.

    1. If you insert counter-incrementing code in place, full of branch targets, this may make branch targets less dense and improve branch prediction.
    2. If you insert counter-incrementing code in front of some particular branch target, this may make branch target’s address better aligned and make code fetch faster.
    3. If you place counter-incrementing code after some data is written but before the same data is loaded again (and store-to-load forwarding did not work for some reason), the load operation may be completed earlier.
    4. Insertion of counter-incrementing code may prevent two conflicting load attempts to the same bank in L1 data cache.
    5. Insertion of counter-incrementing code may alter some CPU scheduler decision and make some execution port available just in time for some performance-critical instruction.

    To investigate effects of compiler optimization, you can compare generated assembler code before and after addition of counter-incrementing code.

    To investigate CPU effects, use a profiler allowing to inspect processor performance counters.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been working on a large java application. It is quite parallel, and
I've noticed that the native C++ application I'm working on has quite a large
I'm working on an application that stores a lot of quite large time/value datasets
The droid application I'm currently working on is getting quite large -> many Activities.
I'm working with quite large graphs in notebook (I'm not a Mathematica expert). Every
I am working on quite a large DB based project and I like the
I'm trying to make it working for quite some time,but just can't seem to
This unix command I haven't got quite working on Mac yet - any ideas
I am working on quite a large database system and was wondering if there
I've been doing quite large application recently with Java - Swing. Now I'd like

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.