I’ve asked a few questions which have touched around this issue, but I’ve been getting differing responses, so I thought best to ask it directly.
Lets say we have the following code:
// Silly examples of A and B, don't take so seriously,
// just keep in mind they're big and not dynamically allocated.
struct A { int x[1000]; A() { for (int i = 0; i != 1000; ++i) { x[i] = i * 2; } };
struct B { int y[1000]; B() { for (int i = 0; i != 1000; ++i) { y[i] = i * 3; } };
struct C
{
A a;
B b;
};
A create_a() { return A(); }
B create_b() { return B(); }
C create_c(A&& a, B&& b)
{
C c;
c.a = std::move(a);
c.b = std::move(b);
return C;
};
int main()
{
C x = create_c(create_a(), create_b());
}
Now ideally create_c(A&&, B&&) should be a no-op. Instead of the calling convention being for A and B to be created and references to them passed on stack, A and B should created and passed in by value in the place of the return value, c. With NRVO, this will mean creating and passing them directly into x, with no further work for the function create_c to do.
This would avoid the need to create copies of A and B.
Is there any way to allow/encourage/force this behavior from a compiler, or do optimizing compilers generally do this anyway? And will this only work when the compiler inline the functions, or will it work across compilation units.
(How I think this could work across compilation units…)
If create_a() and create_b() took a hidden parameter of where to place the return value, they could place the results into x directly, which is then passed by reference to create_c() which needs to do nothing and immediately returns.
There are different ways of optimizing the code that you have, but rvalue references are not one. The problem is that neither
AnorBcan be moved at no cost, since you cannot steal the contents of the object. Consider the following example:In this example, as the resources are held through pointers, there is a simple way of moving the object (i.e. stealing the contents of the old object into the new one and leaving the old object in a destroyable but useless state. Simply copy the pointers and reset them in the old object to null so that the original object destructor will not free the memory.
The problem with both
AandBis that the actual memory is held in the object through an array, and that array cannot be moved to a different memory location for the newCobject.Of course, since you are using stack allocated objects in the code, the old (N)RVO can be used by the compiler, and when you do:
C c = { create_a(), create_b() };the compiler can perform that optimization (basically set the attributec.aon the address of the returned object fromcreate_a, while when compilingcreate_a, create the returned temporary directly over that same address, so effectively,c.a, the returned object fromcreate_aand the temporary constructed insidecreate_a(implicitthisto the constructor) are the same object, avoiding two copies. The same can be done withc.b, avoiding the copying cost. If the compiler does inline your code, it will removecreate_cand replace it with a construct similar to:C c = {create_a(), create_b()};so it can potentially optimize all copies away.Note on the other hand, that this optimization cannot be completely used in the case of a
Cobject allocated dynamically as inC* p = new C; p->a = create_a();, since the destination is not in the stack, the compiler can only optimize the temporary insidecreate_aand its return value, but it cannot make that coincide withp->a, so a copy will need to be done. This is the advantage of rvalue-references over (N)RVO, but as mentioned before you cannot do use effectively rvalue-references in your code example directly.