I am benchmarking the cache behaviour of two search algorithms that operate on a

Question

0

Asked: May 31, 20262026-05-31T17:08:02+00:00 2026-05-31T17:08:02+00:00

I am benchmarking the cache behaviour of two search algorithms that operate on a

0

I am benchmarking the cache behaviour of two search algorithms that operate on a sorted range of items with Cachegrind. I have n items in a vector, and another vector that holds all valid indices. I use std::random_shuffle on the second vector, and do then perform n successful lookups on the items in the first vector. The function I am benchmarking looks roughly as follows:

template <typename Iterator>
void lookup_in_random_order(Iterator begin, Iterator end)
{
    const std::size_t N = std::distance(begin, end);
    std::vector<std::size_t> idx(N);
    std::iota(idx.begin(), idx.end(), 0);

    std::srand(std::time(0));
    std::random_shuffle(idx.begin(), idx.end());

    // Warm the cache -- I don't care about measuring this loop.
    for(std::size_t i = 0; i < N; ++i)
        my_search(begin, end, idx[i]);

    std::random_shuffle(idx.begin(), idx.end());

    // This one I would care about!
    for(std::size_t i = 0; i < N; ++i)
    {
        int s = idx[i];
        // Especially this line, of course.
        my_search(begin, end, s);
    }
}

I compile my code with g++ (with -g and -O2). I run Cachegrind and then cg_annotate. I get something like the following:

       Ir I1mr ILmr        Dr    D1mr    DLmr Dw D1mw DLmw
        .    .    .         .       .       .  .    .    .  template <typename Iterator>
       17    2    2         0       0       0  6    0    0  void lookup_in_random_order(Iterator begin, Iterator end)
        .    .    .         .       .       .  .    .    .  {
        .    .    .         .       .       .  .    .    .      const std::size_t N = std::distance(begin, end);
        .    .    .         .       .       .  .    .    .      std::vector<std::size_t> idx(N);
        .    .    .         .       .       .  .    .    .      std::iota(idx.begin(), idx.end(), 0);
        .    .    .         .       .       .  .    .    .      
        4    0    0         0       0       0  2    1    1      std::srand(std::time(0));
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .  
3,145,729    0    0         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .              my_search(begin, end, idx[i]);
        .    .    .         .       .       .  .    .    .  
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .  
3,145,729    1    1         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .      {
1,048,575    0    0 1,048,575 132,865 131,065  0    0    0              int s = idx[i];
        .    .    .         .       .       .  .    .    .              my_search(begin, end, s);
        .    .    .         .       .       .  .    .    .      }
        7    0    0         6       1       1  0    0    0  }

For some reason, some lines (especially the most interesting one!) consist of dots. Now, the Cachegrind manual says “Events not applicable for a line are represented by a dot. This is useful for distinguishing between an event which cannot happen, and one which can but did not.”

How should this be interpreted? My first idea was that maybe the compiler optimizes my searches away. I thought this cannot be, because the program did spend quite a bit of time running. Still, I tried compiling without the -O2 flag and it seemed to work in a sense that now every line with a call to my_search recorded some numbers (no dots anymore!). However, this doesn’t seem like the right way to go for obvious reasons.

In general, is there a way I can tell Cachegrind that “look at this line especially, I am very interested how many cache misses it causes”?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T17:08:03+00:00

Editorial Team

2026-05-31T17:08:03+00:00Added an answer on May 31, 2026 at 5:08 pm

My guess is that with O2 it allows the compiler to perform automatic inlining of the functions where you see the dots. Cachegrind will not see the inlined function calls as the calls have dissappeared. Try “-fno-inline” (Compiler options)

Of course you will probably have different cache performance numbers with and without inlining.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am benchmarking the cache behaviour of two search algorithms that operate on a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply