Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4543410
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T15:31:40+00:00 2026-05-21T15:31:40+00:00

Profiling my code, i see a lot of cache misses and would like to

  • 0

Profiling my code, i see a lot of cache misses and would like to know whether there is a way to improve the situation. Optimization is not really needed, I’m more curious about whether there exist general approaches to this problem (this is a follow up question).

// class to compute stuff
class A {
    double compute();
    ...
    // depends on other objects
    std::vector<A*> dependencies;
}

I have a container class that stores pointers to all created objects of class A. I do not store copies as I want to have shared access. Before I was using shared_ptr, but as single As are meaningless without the container, raw pointers are fine.

class Container {
    ...
    void compute_all();
    std::vector<A*> objects;
    ...
}

The vector objects is insertion sorted in a way that the full evaluation can be done by simply iterating and calling A.compute(), all dependencies in A are resolved.

With a_i objects of class A, the evaluation might look like this:

a_1 => a_2 => a_3 --> a_2 --> a_1 => a_4 => ....

where => denotes iteration in Container and –> iteration over A::dependencies

Moreover, the Container class is created only once and compute_all() is called many times, so rearranging the whole structure after creation is an option and wouldn’t harm efficiency much.

Now to the observations/questions:

  1. Obviously, iterating over Container::objects is cache efficient, but accessing the pointees is definitely not.

  2. Moreover, as each object of type A has to iterate over A::dependencies, which again can produces cache misses.

Would it help to create a separate vector<A*> from all needed object in evaluation order such that dependencies in A are inserted as copies?

Something like this:

a_1 => a_2 => a_3 => a_2_c => a_1_c => a_4 -> ....

where a_i_c are copies from a_i.

Thanks for your help and sorry if this question is confusing, but I find it rather difficult to extrapolate from simple examples to large applications.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T15:31:40+00:00Added an answer on May 21, 2026 at 3:31 pm

    Unfortunately, I’m not sure if I’m understanding your question correctly, but I’ll try to answer.

    Cache misses are caused by the processor requiring data that is scattered all over memory.

    One very common way of increasing cache hits is just organizing your data so that everything that is accessed sequentially is in the same region of memory. Judging by your explanation, I think this is most likely your problem; your A objects are scattered all over the place.

    If you’re just calling regular new every single time you need to allocate an A, you’ll probably end up with all of your A objects being scattered.

    You can create a custom allocator for objects that will be creating many times and accessed sequentially. This custom allocator could allocate a large number of objects and hand them out as requested. This may be similar to what you meant by reordering your data.

    It can take a bit of work to implement this, however, because you have to consider cases such as what happens when it runs out of objects, how it knows which objects have been handed out, and so on.

    // This example is very simple. Instead of using new to create an Object,
    // the code can just call Allocate() and use the pointer returned.
    // This ensures that all Object instances reside in the same region of memory.
    struct CustomAllocator {
        CustomAllocator() : nextObject(cache) { }
    
        Object* Allocate() {
            return nextObject++;
        }
    
        Object* nextObject;
        Object cache[1024];
    }
    

    Another method involves caching operations that work on sequential data, but aren’t performed sequentially. I think this is what you meant by having a separate vector.

    However, it’s important to understand that your CPU doesn’t just keep one section of memory in cache at a time. It keeps multiple sections of memory cached.

    If you’re jumping back and forth between operations on data in one section and operations on data in another section, this most likely will not cause many cache hits; your CPU can and should keep both sections cached at the same time.

    If you’re jumping between operations on 50 different sets of data, you’ll probably encounter many cache misses. In this scenario, caching operations would be beneficial.

    In your case, I don’t think caching operations will give you much benefit. Ensuring that all of your A objects reside in the same section of memory, however, probably will.

    Another thing to consider is threading, but this can get pretty complicated. If your thread is doing a lot of context switches, you may encounter a lot of cache misses.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

The way to see how fast your code is going, is performance profiling. There
I'd like to do some basic profiling of my code, but found that the
I'm profiling the code that I have developed, and I see a bottleneck in
While profiling a bit of code that use's many boost functions with gprof, there
I'm profiling some computationally intensive code of mine, and was surprised to see that
We've been profiling our code recently and we've come across a few annoying hotspots.
The SO community was right, profiling your code before you ask performance questions seems
Currently I am doing the profiling to a piece of code. During the profiling,
There are numerous libraries providing Linq capabilities to C# code interacting with a MySql
i just discovered http://code.google.com/p/re2 , a promising library that uses a long-neglected way (

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.