I’m trying to understand how these languages work under the hood. Unfortunately I only ever read very superficial things.
I’ll summarize what I know already, I would be really happy if you could correct me, and most of all, help me enhance my little bits of half-knowledge.
C++:
The C++ compiler preprocesses all source files. This means, that it actually inserts strings into the places where macros where originally. After that, it creates an .obj file for each source file containing machine independant bytecode.
The linker then links all external .obj files from libraries with the custom made .obj files together, and compiles it into an .exe.
Java:
Java code is compiled into machine independant “bytecode” which sits in .class files, which in turn can sit in .JAR files, which get run on the JRE. The virtual machine is just doing garbage cleanup then. Java code is compiled just-in-time like C#, but with hotspot optimization developed by SUN.
C#:
Practically the same as Java? C# source code gets compiled into CIL (Common Intermediate Language) code, which is still human readable. This code will be run by the CLR Just-in-Time. This compilation turns methods into machine specific code just when they are first called.
I’m actually interested in pretty much every language…but Java and C# are almost the same, and I always wondered how the differentiate. And C++ is the “classic” so to speak. The father of both without any kind of virtual machine. Appreciate the help!
edit: I know that this is a broad subject, but I really couldn’t find any solid knowledge. If you have links or books that explain this sort of thing I’m happy to go to work. I tried to read the SUN specifications/whitepapers for the java virtual machine, but that is all a little too deep for me right now.
The compilation of unmanaged C++ is very different from the compilation of managed C++, C# and Java.
Unmanaged C++
Unmanaged C++ (“traditional” C++) is compiled directly into machine code. The programmer invokes a compiler that targets a specific platform (processor and operating system), and the compiler outputs an executable that works only on that platform. The executable contains the machine code that the particular processor understands. When executed, the processor will directly execute the compiled code as is (modulo virtual memory address translation yadda yadda).
Managed C++, C# and Java
Managed code is compiled into an intermediate code (CIL in the case of .NET languages like C#, and Java bytecode in the case of Java). The compiler outputs an executable that contains code in this intermediate language. At this point, it is still platform-independent. When executed, a so-called Just-in-Time compiler will kick in, which translates the intermediate code into machine code just before executing. The processor will then execute the machine code generated by the JIT compiler. Most of the time, this machine code is kept in memory and discarded at the end of the program (so it has to run the JITting again the next time), but tools exist to do the JITting permanently.
The benefit here of course is that the platform-independent executable can be run on any platform, but the downside is that you need an execution environment (including a JIT compiler) for that platform.