I am working on a large scale software. It’s driven by memory/data moving between huge amount of complicated models.
Sometime the cache misses is too high and the performance is not good. But the scenario looks too complicated to me.
I just want to get some general ideas on how to reduce the cache miss and improve memory performance.
Appreicate for any comment.
Thanks!
The most valuable tool when hunting performance bottlenecks is measurement. You need to figure out what code has the problem and then measure it for cache misses, if that indeed proves to be the problem.
As for general ideas, you will need to lower the miss rate. So when you pull data into memory, you need to work as much as possible on it before you leave it again, rather than stream data. Compare as an example,
which traverses the list three times. It may be possible to write this as:
lowering the traverse to only a single time – usually leading to fewer misses.
Another worthy trick is to think about the data structure. The access patterns of a binary tree are much different from those of a hash table. But establish measurement first so you can be sure you got the misses nailed – and that it is the misses that is your problem.
Finally, even with low miss rates, you can look into lowering memory bandwidth in general. If you move lots and lots of data, it tend to be slow – since memory speeds grow at a much lower rate compared to transistor count.