The training materials from the class I took seem to be making two conflicting statements.
On one hand:
“Use of inline functions usually results in faster execution”
On the other hand:
“Use of inline functions may decrease performance due to more frequent
swapping”
Question 1: Are both statements true?
Question 2: What is meant by “swapping” here?
Please glance at this snippet:
int powA(int a, int b) {
return (a + b)*(a + b) ;
}
inline int powB(int a, int b) {
return (a + b)*(a + b) ;
}
int main () {
Timer *t = new Timer;
for(int a = 0; a < 9000; ++a) {
for(int b = 0; b < 9000; ++b) {
int i = (a + b)*(a + b); // 322 ms <-----
// int i = powA(a, b); // not inline : 450 ms
// int i = powB(a, b); // inline : 469 ms
}
}
double d = t->ms();
cout << "--> " << d << endl;
return 0;
}
Question 3: Why is performance so similar between powA and powB? I would have expected powB performance to be along 322ms, since it is, after all, inline.
Question 1
Yes, both statements can be true, in particular circumstances. Obviously they won’t both be true at the same time.
Question 2
“Swapping” is likely a reference to OS paging behaviour, where pages are swapped out to disk when the memory pressure becomes high.
In practice, if your inline functions are small then you will usually notice a performance improvement due to eliminating the overhead of a function call and return. However, in very rare circumstances, you may cause code to grow such that it cannot completely reside inside the CPU cache (during a performance-critical tight loop), and you may experience decreased performance. However, if you’re coding at that level then you probably should be coding directly in assembly language anyway.
Question 3
The
inlinemodifier is a hint to the compiler that it might want to consider compiling the given function inline. It doesn’t have to follow your directions, and the result may also depend on the given compiler options. You can always look at the generated assembly code to find out what it did.Your benchmark may not even be doing what you want because your compiler might be smart enough to see that you’re not even using the result of the function call that you assign into
i, so it might not even bother to call your function. Again, look at the generated assembly code.