Today I’ve found the Disassembler IL between the tools provided with VS2008. I tried to disassemble a program and give a look to the result. Opcodes weren’t so hard to understand but one thing surprised me: the .NET is stack based?! Reading “Write great code, volume II” I didn’t get a good picture of stack based machines because they’re quite slow. They’re easy to implement, too but I don’t think MS devs chose this approach because of its simplicity, after all that code has to be translated into real machine code so they would just move the problem.
Can any of you explain this strange choice?
PS:
I post here what I read about this topic:
13.1.1 Stack-Based Machines
Stack-based machines use memory for
most calculations, employing a stack
in memory to hold all operands and
results. Computer systems employing a
stack architecture offer some
important advantages over other
architectures:
- The
instructions are often smaller (each
consuming fewer bytes) than those
found in other architectures because
the instructions generally don’t have
to specify any operands.- It
is generally easier to write compilers
for stack architectures than for other
machines because converting arithmetic
expressions to a sequence of stack
operations is very easy.- Temporary variables are rarely
needed in a stack architecture,
because the stack itself serves that
purpose.Unfortunately, stack
machines also suffer from some serious
disadvantages:
- Almost every
instruction references memory (which
is slow on modern machines). Though
caches can help mitigate this problem,
memory performance is still a major
problem on stack machines.- Even though conversion from HLLs
to a stack machine is very easy, there
is less opportunity for optimization
than there is with other
architectures.- Because stack
machines are constantly accessing the
same data elements (that is, data on
the top of the stack), pipelining and
instruction parallelism is difficult
to achieve (see Write Great Code,
Volume 1 for details on pipelining and
instruction parallelism).A
stack is a data structure that allows
operations only on a few limited
elements of the stack (often called
the top of stack and next on stack).
With a stack you generally do one of
three things: push new data onto the
stack, pop data from the stack, or
operate on the data that is currently
sitting on the top of the stack (and
possibly the data immediately below
it).
and
13.1.1.5 Real-World Stack Machines
A big advantage of the stack
architecture is that it is easy to
write a compiler for such a machine.
It’s also very easy to write an
emulator for a stack-based machine.
For these reasons, stack architectures
are popular in virtual machines (VMs)
such as the Java Virtual Machine and
the Microsoft Visual Basic p-code
interpreter. A few real-world
stack-based CPUs do exist, such as a
hardware implementation of the Java
VM; however, they are not very popular
because of the performance limitations
of memory access. Nonetheless,
understanding the basics of a stack
architecture is important because many
compilers translate HLL source code
into a stack-based form prior to
translating to actual machine code.
Indeed, in the worst case (though
rare), compilers are forced to emit
code that emulates a stack-based
machine when compiling complex
arithmetic expressions.
EDIT: I’ve just found an article in @EricLippert’s blog answering the question and confirming @Aaron’s answer
Keep in mind that just because the intermediate representation is stack-based it doesn’t mean the generated machine code is stack-based. As the code is converted from the intermediate form to machine code it’s basically recompiled – allowing for local optimizations.
Once nice thing about using a stack-based intermediate representation is that you’re not tied to any specific architecture.
Imagine if they had decided to use a theoretical register-based system as their intermediate form. How many registers should they pick? 8? 16? 64? If your target processor has more actual registers than the intermediate form then you’ve lost out on possible optimizations. If your target has less actual registers than the intermediate then your optimizations are counter-productive because those registers are flushed to memory anyway.
Even on current CPUs you’ve got a big difference compiling down to x86 vs x64 – not to mention alternate architectures (ARM) or future architectures.
For something like this it’s good that they kept it in the simplest form and then rely on optimization during final code generation to match it to the actual hardware.