Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4015824
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T09:39:57+00:00 2026-05-20T09:39:57+00:00

It appears that CPUs run significantly faster if their L2 is not filled. Will

  • 0

It appears that CPUs run significantly faster if their L2 is not filled. Will a programmer be better off to code something that will eventually be smaller in binary, even if parts of that code are not executed all the time? Say, parts of code that are only turned on in a config file.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T09:39:57+00:00Added an answer on May 20, 2026 at 9:39 am

    The truth is somewhat more complex, I’ll try to outline it for you.

    If you look at the memory hierarchy in a modern PC with a multi-core processor you will find that there are six levels:

    1. The prefetcher, one for every core (no latency)
    2. The L1 cache, one or two (combined or code and data, 2*64K on
      AMD K10) for every core (latency say
      three clks)
    3. The L2 cache, one (512K on AMD K10) for every core (latency say
      10)
    4. The L3 cache, one (ncores*1 MB on AMD K10) per processor used by
      all cores (latency say 30)
    5. System RAM, one per system used by all processors (latency say
      100)
    6. Synchronization (or bus lock), one method per system used by all bus mastering
      devices (latency at least 300 cycles up to 1 us if an old PCI card
      is using all 32 clocks available
      when bus-mastering with clocking at
      33 MHz – on a 3 GHz processor that means 3000 clock cycles)

    Don’t see the cycle counts as exact, they’re meant to give you a feel for the possible penalities incurred when executing code.

    I use synchronization as a memory level because sometimes you need to synchronize memory too and that costs time.

    The language you use will have a great impact on performance. A program written in C, C++ or ForTran will be smaller and execute faster than an interpreted program such as Basic, C# and Java. C and Fortran will also give you a better control when organizing your data areas and program access to them. Certain functions in OO languages (C++, C# and Java) such as encapsulation and usage of standard classes will result in larger code being generated.

    How code is written also has a great impact on performance – though some uninformed individuals will say that compilers are so good these days that it isn’t necessary to write good source code. Great code will mean great performance and Garbage In will always result in Garbage Out.

    In the context of your question writing small is usually better for performance than not caring. If you are used to coding efficiently (small/fast code) then you’ll do it regardless of whether you’re writing seldom- or often-used sequences.

    The cache will most likely not have your entire program loaded (though it might) but rather numerous 32 or 64 byte chunks (“cache lines”) of data fetched from even 32 or 64 byte addresses in your code. The more the information in one of these chunks is accessed the longer it will keep the cache line it’s sitting in. If the core wants one chunk that’s not in L1 it will search for it all the way down to RAM if necessary and incurring penalty clock cycles while doing it.

    So in general small, tight and inline code sequences will execute faster because they impact the cache(s) less. Code that makes a lot of calls to other code areas will have a greater impact on the cache, as will code with unoptimized jumps. Divisions are extremely detrimental but only to the execution of the core in question. Apparently AMD is much better at them than intel (http://gmplib.org/~tege/x86-timing.pdf).

    There is also the issue of data organization. Here it is also better to have often-used data in residing in a physically small area such that one cache line fetch will bring in several often-used variables instead of just one per fetch (which is the norm).

    When accessing arrays of data or data structures try to make sure that you access them from lower to higher memory addresses. Again, accessing all over the place will have a negative impact on the caches.

    Finally there is the technique of giving data pre-fetch hints to the processor so that it may direct the caches to begin fetching data as far as possible before the data will actually be used.

    To have a reasonable chance of understanding these things so that you may put them to use at a practical level, it will be necessary for you to test different constructs and time them, preferably with the rdtsc counter (lots of info about it here at stackoverflow) or by using a profiler.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Appears that WAS does not support the integration binding. I've tried setting it up
It appears that second windows are not closing. Here is what I am doing....
It appears that we will have to build/deploy one of our new JBoss apps
It appears that my script does not want to wait for the $.post call
It appears that WAS will call ServiceHostFactory.CreateHost() once per each service implementation. How does
It appears that (!$a == 'hello') is consistently faster than ($a != 'hello') //
It appears that PDF::API2 does not support PDF 1.5 (and later) compression of the
It appears that using [] around a generator expression (test1) behaves substantially better than
It appears that I am not able to choose between two names for a
It appears that userInteractionEnabled=NO on a parent view will prevent user interaction on all

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.