For a program to be cache efficient the data used should be stored linearly right?
So instead of dynamic allocation I put my data in a blob using a linear allocator. Is this enought to improve performace? what should I do to improve cache efficiency even more?
I know that this questions arent specific but I don’t know how to explain it…
Which programs can help me profile cache hits/misses?
If your looking for a profiler for windows, you can try AMD’s CodeAnalyst or VerySleepy, both of these are free, AMDs is the more powerful of the two however( and works on intel hardware, but iirc you can’t use the hardware based profiling stuff), it includes monitoring of things like branch prediction misses and cache utilization. Profiling is great, as it tells you what to optimize, but you don’t always know how, for that, you should have a look at Agner Fog’s optimization manuals combined with Intel’s optimization manual (which contains a lot on locality and cachability optimizations)