What should I take in consideration when developing a game in terms of fast memory access in C++?
The memory I load is static so I should put in in a continuous block of memory right?
Also, how should I organize the variables inside structs to improve performance?
Memory Performance is extremely vague.
I think that what you are looking for is about handling the CPU Cache as there is a factor of about 10 between an access in the cache and an access in the main memory.
For a complete reference on the mechanisms behind the cache, you might wish to read this excellent serie of articles by Ulrich Drepper on lwn.net.
In short:
Aim at Locality
You should not jump around in memory, so try (when possible) to group together items that will be used together.
Aim at Predictability
If your memory accesses are predictable, the CPU will likely prefetch the memory for the next chunk of work, so that it is available immediately, or shortly, after finishing the current chunk.
The typical example is with
forloops on arrays:Change
array[i][j] += 1;witharray[j][i] += 1;and the performance varies… at low optimization levels 😉The compiler should catch those obvious cases, but some are more insidious. For example, the use of Node Based containers (linked lists, binary search trees) instead of array-based containers (vector, some hash tables) may slow down the application.
Don’t waste space… beware of false sharing
Try to pack your structures. This has to do with alignment, and you might be wasting space due to alignment issues within your structures, which artificially inflate the structure size and waste cache space.
A typical rule of thumb is to order the items in the structure by decreasing size (use
sizeof). This is dumb, but works well. If you are more knowledgeable about the size and alignments, just avoid holes 🙂 Note: only useful for structure with lots of instances…However, beware of false sharing. In Multi Threaded programs, concurrent access to two variables that are close enough to share the same cache line is costly, because it involves a lot of cache invalidation and CPU battling for cache line ownership.
Profile
Unfortunately, this is HARD to figure out.
If you happen to be programming on Unix,
Callgrind(part of the Valgrind suite) can be run with cache simulation and identify the parts of the code triggering the cache misses.I guess that there are other tools, I just never used them.