I am writing a compiler for a small imperative language. The target language is Java bytecode, and the compiler is implemented in Haskell.
I’ve written a frontend for the language – i.e I have a lexer, parser and typechecker. I’m having trouble figuring out how to do code generation.
I keep a data structure representing the stack of local variables. I can query this structure with the name of a local variable and get its position in the stack. This data structure is passed around as I walk the syntax tree, and variables are popped and pushed as I enter and exit new scopes.
What I having trouble figuring out is how to emit the bytecode. Emitting strings at terminals and concatenating them at higher levels seems like a poor solution, both clarity- and performance-wise.
tl;dr How do I emit bytecode while waling the syntax tree?
My first project in Haskell a few months back was to write a c compiler, and what resulted was a fairly naive approach to code generation, which I’ll walk through here. Please do not take this as an example of good design for a code generator, but rather view it as a quick and dirty (and ultimately naive) way to get something that works fairly quickly with decent performance.
I began by defining an intermediate representation LIR (Lower Intermediate Representation) which closely corresponded to my instruction set (x86_64 in my case):
Next up came a monad that would handle interleaving state throughout the translation (I was blissfully unaware of our friend-the
State Monad-at the time):along with the state that would be ‘threaded’ through the various translation phases:
For convenience, I also specified a series of translator monadic functions, for example:
I then proceeded to recursively pattern-match my AST, fragment-by-fragment, resulting in many functions of the form:
(for translating a block of the target language’s code) or
(for translating a method call) or
for an example of actually generating some LIR code. Hopefully these three examples will give you a good starting point; ultimately, you’ll want to go slowly, focusing on one fragment (or intermediate type) within your AST at a time.