In my project I have a class where execution time is the first goal. For it I don’t care much about maintenance, order and so on. At least I didn’t care till yesterday… Now I’m in situation where I have to be a little worried about it, too.
I have a class, say A, which performs multiple scans on images coming from a camera, i.e. a variable width window scans them in real time.
class A{
// methods and attributes of A:
...
void runiterator(){
...
for{ // change window’s dimension
for{ // rows
for{ // columns
// many lines of code of operations to be executed for each window at each position
...
}
}
}
}
};
Performance shows already a little delay, but I could solve it skipping a limited area of the image. Furthermore I have a second function, say B, which has exactly the same scheme as A, and executes different operations on each scan (and luckily is much faster than A).
Well, now it is time to join all the operations to benefit significantly the overall result. Only that the code would really become messed-up, huge and mixing things that are really different. I thought to define a class X that does the iterations and at each scan executes function calls to one function in A_new and one in B_new. But I’m worried that about 200000×2 function calls per image would result in loss in performance.
What is your advice?
EDIT
With class X that only calls Anew (so it could only be compared to what now is A), I get on the average, out of many repetitions:
Time for executing X on a series of 56 images = 6.15 s
Time for executing A on thesame series of 56 images = 5.98 s
It seems that my suspects were not so naive.
The difference is about 3%, not so much, but still sorry for the loss.
With __forceinline time is 5.98 s for X as well, but I would prefer not relying on it.
I think that code is optimized and margins for further improvements are very little.
Indeed it does a lot of stuff on images in a relatively short time.
Processing data sequentially is not possible in class A because it is based on the values coming from the images which are impredictable. This is the reason why class B (that manages to do it) is much faster.
You really have to measure that it causes performance problems before worrying about it.
If there is a problem, try to do it with templates. Write the two variants of the functions, then use them as functors in your function template which does the iteration. You’ll instantiate both versions, and call the appropriate one. The compiler should inline the calls (but better verify this).
I used this on medical image manipulation and it worked like a charm.