I was looking up the pypy project (Python in Python), and started pondering the issue of what is running the outer layer of python? Surely, I conjectured, it can’t be as the old saying goes “turtles all the way down”! Afterall, python is not valid x86 assembly!
Soon I remembered the concept of bootstrapping, and looked up compiler bootstrapping. “Ok”, I thought, “so it can be either written in a different language or hand compiled from assembly”. In the interest of performance, I’m sure C compilers are just built up from assembly.
This is all well, but the question still remains, how does the computer get that assembly file?!
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
Can someone explain this to me?
I understand what you’re asking… what would happen if we had no C compiler and had to start from scratch?
The answer is you’d have to start from assembly or hardware. That is, you can either build a compiler in software or hardware. If there were no compilers in the whole world, these days you could probably do it faster in assembly; however, back in the day I believe compilers were in fact dedicated pieces of hardware. The wikipedia article is somewhat short and doesn’t back me up on that, but never mind.
The next question I guess is what happens today? Well, those compiler writers have been busy writing portable C for years, so the compiler should be able to compile itself. It’s worth discussing on a very high level what compilation is. Basically, you take a set of statements and produce assembly from them. That’s it. Well, it’s actually more complicated than that – you can do all sorts of things with lexers and parsers and I only understand a small subset of it, but essentially, you’re looking to map C to assembly.
Under normal operation, the compiler produces assembly code matching your platform, but it doesn’t have to. It can produce assembly code for any platform you like, provided it knows how to. So the first step in making C work on your platform is to create a target in an existing compiler, start adding instructions and get basic code working.
Once this is done, in theory, you can now cross compile from one platform to another. The next stages are: building a kernel, bootloader and some basic userland utilities for that platform.
Then, you can have a go at compiling the compiler for that platform (once you’ve got a working userland and everything you need to run the build process). If that succeeds, you’ve got basic utilities, a working kernel, userland and a compiler system. You’re now well on your way.
Note that in the process of porting the compiler, you probably needed to write an assembler and linker for that platform too. To keep the description simple, I omitted them.
If this is of interest, Linux from Scratch is an interesting read. It doesn’t tell you how to create a new target from scratch (which is significantly non trivial) – it assumes you’re going to build for an existing known target, but it does show you how you cross compile the essentials and begin building up the system.
Python does not actually assemble to assembly. For a start, the running python program keeps track of counts of references to objects, something that a cpu won’t do for you. However, the concept of instruction-based code is at the heart of Python too. Have a play with this:
There you can see how Python thinks of the code you entered. This is python bytecode, i.e. the assembly language of python. It effectively has its own “instruction set” if you like for implementing the language. This is the concept of a virtual machine.
Java has exactly the same kind of idea. I took a class function and ran
javap -c classto get this:I take it you get the idea. These are the assembly languages of the python and java worlds, i.e. how the python interpreter and java compiler think respectively.
Something else that would be worth reading up on is JonesForth. This is both a working forth interpreter and a tutorial and I can’t recommend it enough for thinking about “how things get executed” and how you write a simple, lightweight language.