I am writing a program to parse a file. It consists of a main

Question

0

Asked: June 11, 20262026-06-11T15:03:39+00:00 2026-06-11T15:03:39+00:00

I am writing a program to parse a file. It consists of a main

0

I am writing a program to parse a file. It consists of a main loop that parses character by character and treats them. Here is the main loop:

char c;
char * ptr;

for( size_t i = 0; i < size ; ++i )
{
    ptr = ( static_cast<char*>(sentenceMap) + i );
    c = *ptr;

    __builtin_prefetch( ptr + i + 1 );

   // some treatment on ptr and c   
}

As you can see, I added a builtin_prefetch instruction, hoping to put in cache the next iteration of my loop. I tried with different values : ptr+i+1, ptr+i+2, ptr+i+10 but nothing seems to change.

To measure performance, I use valgrind’s tool cachegrind, which gives me an indication of the number of cache misses. On the line c = *ptr, cachegrind records 632,378 DLmr (L3 cache miss) when __builtin_prefetch is not set. What’s weird though, is that this value does not change, regardless of the parameter I set to __builtin_prefetch.

Any explanation to that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T15:03:40+00:00

That’s because the hardware is years ahead of you. 🙂

There are hardware prefetchers that are designed to recognize simple patterns and do the prefetching for you. In this case, you have a simple sequential access pattern, that’s more than trivial for the hardware prefetcher.

Manual prefetching only comes handy when you have access patterns that the hardware cannot predict.

Here’s one such example: Prefetching Examples?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a program to parse a file. It consists of a main

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply