In trying to determine the cache size for a given CPU, I tried to time the memory access to memory/cache like:
lengthMod = sizes[i]/sizeof(int) - 1; // where sizes[i] is something like 1024, 2048 ...
for (unsigned int k = 0; k < REPS; k++) {
data[(k * 16) & lengthMod]++;
}
1, 0.52
4, 0.52
8, 0.52
16, 0.52
32, 0.52
64, 1.11 // << note the jump in timing. L1 cache size is 32K
128, 1.12
256, 1.19
So I think if the lengthMod is not a power of 2, I cant do this. So I tried doing
lengthMod = sizes[i]/sizeof(int);
for (unsigned int k = 0; k < REPS; k++) {
data[(k * 16) % lengthMod]++;
}
1, 2.67
4, 2.57
8, 2.55
16, 2.51
32, 2.42
64, 2.42 // << no jump anymore ...
128, 2.42
256, 2.42
Then I find that the timing increase that I expected is non-existant anymore … I expected the time to increase but it should apply to all values? So if its x seconds when using &, I’d expect ~x+c seconds (where c is approximatly constant), but thats not the case, in fact, it reduces the timing difference to non-existant why is that?
What you’re seeing is a trade-off of bottlenecks.
Before we continue, let’s look at the difference between the two examples:
&which is a fast bitwise operation.%which is very slow division.Divisions are very slow. Modern compilers will try to optimize them when the divisor/modulus is a compile-time constant.
But that’s not the case here. So you pay the full cost of a hardware division. This is why the times in your second example are much slower than the first.
With the
&, the code is fast enough to max out the cache bandwidth. However, with%, the code is much slower – not fast enough to keep up with the cache. So you see the same times all the way up.