Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7676153
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T17:08:02+00:00 2026-05-31T17:08:02+00:00

I am benchmarking the cache behaviour of two search algorithms that operate on a

  • 0

I am benchmarking the cache behaviour of two search algorithms that operate on a sorted range of items with Cachegrind. I have n items in a vector, and another vector that holds all valid indices. I use std::random_shuffle on the second vector, and do then perform n successful lookups on the items in the first vector. The function I am benchmarking looks roughly as follows:

template <typename Iterator>
void lookup_in_random_order(Iterator begin, Iterator end)
{
    const std::size_t N = std::distance(begin, end);
    std::vector<std::size_t> idx(N);
    std::iota(idx.begin(), idx.end(), 0);

    std::srand(std::time(0));
    std::random_shuffle(idx.begin(), idx.end());

    // Warm the cache -- I don't care about measuring this loop.
    for(std::size_t i = 0; i < N; ++i)
        my_search(begin, end, idx[i]);

    std::random_shuffle(idx.begin(), idx.end());

    // This one I would care about!
    for(std::size_t i = 0; i < N; ++i)
    {
        int s = idx[i];
        // Especially this line, of course.
        my_search(begin, end, s);
    }
}

I compile my code with g++ (with -g and -O2). I run Cachegrind and then cg_annotate. I get something like the following:

       Ir I1mr ILmr        Dr    D1mr    DLmr Dw D1mw DLmw
        .    .    .         .       .       .  .    .    .  template <typename Iterator>
       17    2    2         0       0       0  6    0    0  void lookup_in_random_order(Iterator begin, Iterator end)
        .    .    .         .       .       .  .    .    .  {
        .    .    .         .       .       .  .    .    .      const std::size_t N = std::distance(begin, end);
        .    .    .         .       .       .  .    .    .      std::vector<std::size_t> idx(N);
        .    .    .         .       .       .  .    .    .      std::iota(idx.begin(), idx.end(), 0);
        .    .    .         .       .       .  .    .    .      
        4    0    0         0       0       0  2    1    1      std::srand(std::time(0));
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .  
3,145,729    0    0         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .              my_search(begin, end, idx[i]);
        .    .    .         .       .       .  .    .    .  
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .  
3,145,729    1    1         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .      {
1,048,575    0    0 1,048,575 132,865 131,065  0    0    0              int s = idx[i];
        .    .    .         .       .       .  .    .    .              my_search(begin, end, s);
        .    .    .         .       .       .  .    .    .      }
        7    0    0         6       1       1  0    0    0  }

For some reason, some lines (especially the most interesting one!) consist of dots. Now, the Cachegrind manual says “Events not applicable for a line are represented by a dot. This is useful for distinguishing between an event which cannot happen, and one which can but did not.”

How should this be interpreted? My first idea was that maybe the compiler optimizes my searches away. I thought this cannot be, because the program did spend quite a bit of time running. Still, I tried compiling without the -O2 flag and it seemed to work in a sense that now every line with a call to my_search recorded some numbers (no dots anymore!). However, this doesn’t seem like the right way to go for obvious reasons.

In general, is there a way I can tell Cachegrind that “look at this line especially, I am very interested how many cache misses it causes”?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T17:08:03+00:00Added an answer on May 31, 2026 at 5:08 pm

    My guess is that with O2 it allows the compiler to perform automatic inlining of the functions where you see the dots. Cachegrind will not see the inlined function calls as the calls have dissappeared. Try “-fno-inline” (Compiler options)

    Of course you will probably have different cache performance numbers with and without inlining.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a benchmarking program that calculates the time (in milliseconds and ticks), for
I have a home page that has several independent dynamic parts. The parts consist
I have this code, that when swapping the order of UsingAs and UsingCast, their
We're benchmarking some code that we've converted to use sendfile(), the linux zero-copy system
I have tested a couple of benchmarking snippets on Delphi like this one: uses
I have an application written in C++ using Qt4.4.3 on Linux. Doing some benchmarking,
I am interested in forcing a CPU cache flush in Windows (for benchmarking reasons,
In benchmarking some Java code on a Solaris SPARC box, I noticed that the
I'm trying to do some benchmarking of Twisted & Tornado with Mongodb. I have
I'm currently working on benchmarking a RESTful service I've made, and part of that

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.