I remember hearing somewhere that “large functions might have higher execution times” because of code size, and CPU cache or something like that.
How can I tell if function size is imposing a performance hit for my application? How can I optimize against this? I have a CPU intensive computation that I have split into (as many threads as there are CPU cores). The main thread waits until all of the worker threads are finished before continuing.
I happen to be using C++ on Visual Studio 2010, but I’m not sure that’s really important.
Edit:
I’m running a ray tracer that shoots about 5,000 rays per pixel. I create (cores-1) threads (1 per extra core), split the screen into rows, and give each row to a CPU thread. I run the trace function on each thread about 5,000 times per pixel.
I’m actually looking for ways to speed this up. It is possible for me to reduce the size of the main tracing function by refactoring, and I want to know if I should expect to see a performance gain.
A lot of people seem to be answering the wrong question here, I’m looking for an answer to this specific question, even if you think I can probably do better by optimizing the contents of the function, I want to know if there is a function size/performance relationship.
It’s not really the size of the function, it’s the total size of the code that gets cached when it runs. You aren’t going to speed things up by splitting code into a greater number of smaller functions, unless some of those functions aren’t called at all in your critical code path, and hence don’t need to occupy any cache. Besides, any attempt you make to split code into multiple functions might get reversed by the compiler, if it decides to inline them.
So it’s not really possible to say whether your current code is “imposing a performance hit”. A hit compared with which of the many, many ways that you could have structured your code differently? And you can’t reasonably expect changes of that kind to make any particular difference to performance.
I suppose that what you’re looking for is instructions that are rarely executed (your profiler will tell you which they are), but are located in the close vicinity of instructions that are executed a lot (and hence will need to be in cache a lot, and will pull in the cache line around them). If you can cluster the commonly-executed code together, you’ll get more out of your instruction cache.
Practically speaking though, this is not a very fruitful line of optimization. It’s unlikely you’ll make much difference. If nothing else, your commonly-executed code is probably quite small and adjacent already, it’ll be some small number of tight loops somewhere (your profiler will tell you where). And cache lines at the lowest levels are typically small (of the order of 32 or 64 bytes), so you’d need some very fine re-arrangement of code. C++ puts a lot between you and the object code, that obstructs careful placement of instructions in memory.
Tools like
perfcan give you information on cache misses – most of those won’t be for executable code, but on most systems it really doesn’t matter which cache misses you’re avoiding: if you can avoid some then you’ll speed your code up. Perhaps not by a lot, unless it’s a lot of misses, but some.Anyway, what context did you hear this? The most common one I’ve heard it come up in, is the idea that function inlining is sometimes counter-productive, because sometimes the overhead of the code bloat is greater than the function call overhead avoided. I’m not sure, but profile-guided optimization might help with that, if your compiler supports it. A fairly plausible profile-guided optimization is to preferentially inline at call sites that are executed a larger number of times, leaving colder code smaller, with less overhead to load and fix up in the first place, and (hopefully) less disruptive to the instruction cache when it is pulled in. Somebody with far more knowledge of compilers than me, will have thought hard about whether that’s a good profile-guided optimization, and therefore decided whether or not to implement it.