I this MSDN Magazine article, the author states (emphasis mine):
Note that boxing always creates a new
object and copies the unboxed value’s
bits to the object. On the other hand,
unboxing simply returns a pointer to
the data within a boxed object: no
memory copy occurs. However, it is
commonly the case that your code will
cause the data pointed to by the
unboxed reference to be copied anyway.
I’m confused by the sentence I’ve bolded and the sentence that follows it. From everything else I’ve read, including this MSDN page, I’ve never before heard that unboxing just returns a pointer to the value on the heap. I was under the impression that unboxing would result in you having a variable containing a copy of the value on the stack, just as you began with. After all, if my variable contains “a pointer to the value on the heap”, then I haven’t got a value type, I’ve got a pointer.
Can someone explain what this means? Was the author on crack? (There is at least one other glaring error in the article). And if this is true, what are the cases where “your code will cause the data pointed to by the unboxed reference to be copied anyway”?
I just noticed that the article is nearly 10 years old, so maybe this is something that changed very early on in the life of .Net.
The article is accurate. It however talks about what really goes on, not what the IL looks like that the compiler generates. After all, a .NET program never executes IL, it executes the machine code that’s generated from the IL by the JIT compiler.
And the unbox opcode indeed generates code that produces a pointer to the bits on the heap that represents the value type value. The JIT generates a call to a small helper function in the CLR named “JIT_Unbox”. clr\src\vm\jithelpers.cpp if you got the SSCLI20 source code. The Object::GetData() function returns the pointer.
From there, the value most commonly first gets copied into a CPU register. Which then may get stored somewhere. It doesn’t have to be the stack, it could be a member of a reference type object (the gc heap). Or a static variable (the loader heap). Or it could be pushed on the stack (method call). Or the CPU register could be used as-is when the value is used in an expression.
While debugging, right-click the editor window and choose “Go To Disassembly” to see the machine code.