If I have two structs:
struct A
{
float x, y;
inline A operator*(A b)
{
A out;
out.x = x * b.x;
out.y = y * b.y;
return out;
}
}
And an equivalent struct
struct B
{
float x, y;
}
inline B operator*(B a, B b)
{
B out;
out.x = a.x * b.x;
out.y = a.y * b.y;
return out;
}
Would you know of any reason for B’s operator* to compile any differently, or run any slower or faster than A’s operator* (the actual actions that go on inside the functions should be irrelevant)?
What I mean is… would declaring the inline operator as a member, vs not as a member, have any generic effect on the speed of the actual function, whatsoever?
I’ve got a number of different structs that currently follow the inline member operator style… But I was wanting to modify it to be valid C code, instead; so before I do that I wanted to know if there would be any changes to performance/compilation.
The way you have it written, I’d expect
B::operator*to run slightly slower. This is because the “under the hood” implementation ofA::operator*is like:So
Apasses a pointer to its left-hand-side argument to the function, whileBhas to make a copy of that parameter before calling the function. Both have to make copies of their right-hand-side parameters.Your code would be much better off, and probably would implement the same for
AandB, if you wrote it using references and made itconstcorrect:You still want to return objects, not references, since the results are effectively temporaries (you’re not returning a modified existing object).
Addendum
First off, both involve the same dereferencing when you spell out all the code. (Remember, accessing members of
thisimplies a pointer dereference.)But even then, it depends on how smart your compiler is. In this case, let’s say it looks at your structure and decides it can’t stuff it in a register because it’s two floats, so it will use pointers to access them. So the dereferenced pointer case (which is what references get implemented as) is the best you’ll get. The assembly is going to look something like this (this is pseudo-assembly-code):
This is assuming a RISC-like architecture (say, ARM). x86 probably uses less steps but it gets expanded to about this level of detail by the instruction decoder anyway. The point being that it’s all fixed-offset dereferences of pointers in registers, which is about as fast as it will get. The optimizer can try to be smarter and implement the objects across several registers, but that kind of optimizer is a lot harder to write. (Though I have a sneaking suspicion that an LLVM-type compiler/optimizer could do that optimization easily if
resultwere merely a temporary object that is not preserved.)So, since you’re using
this, you have an implicit pointer dereference. But what if the object were on the stack? Doesn’t help; stack variables turn into fixed-offset dereferences of the stack pointer (or frame pointer, if used). So you’re dereferencing a pointer somewhere in the end, unless your compiler is bright enough to take your object and spread it across multiple registers.Feel free to pass the
-Soption togccto get a disassembly of the final code to see what’s really happening in your case.