Compiler – Part 2
Over the past few weeks, I’ve been working aggressively on my compiler, rewriting large parts of the compilation pipeline and rethinking several design decisions.
Correctness
One of the biggest challenges I faced was identifying where things were going wrong. Was the issue in my intermediate representation, or was it in the assembly generation stage?
When you’re building a compiler, this distinction isn’t always obvious. Bugs can surface far downstream from where they were introduced, which makes debugging especially confusing.
Why do we need an intermediate representation?
This is a very reasonable question. Why convert a language into a third format before translating it into real machine code?
The short answer is: you don’t have to. It’s perfectly valid to build a compiler without any intermediate representation at all. Many early compilers worked this way, and even today some simple languages are compiled directly to assembly.
However, the real benefit of an intermediate representation becomes clear when targeting multiple architectures. Since my goal is to support more than one platform, an IR lets me share the majority of the compilation logic and defer only the platform-specific work to the final stage.
Interpreter Mode
While debugging my compiler, I often ran into situations where multiple fixes seemed possible. Should I change how the IR behaves, or should I fix the code generator?
To break this dilemma, I decided to write an interpreter for my intermediate language. This allowed me to validate the compiler logic before generating any assembly code.
Working on the interpreter also helped me better understand the level of abstraction my IR should operate at.
I loosely modeled the interpreter after the ARM64 architecture, taking shortcuts where appropriate. This helped validate many of the assumptions I was making about the code my compiler emits.
Code Generation
Code generation is often the hardest part of a compiler. It involves translating high-level concepts into very low-level operations.
// C x = 4; // ARM64 assembly str #4, [x29, #-16]
At the hardware level, there are no variables—only memory. It’s the compiler’s responsibility to divide that memory into variables and give them meaning through types and conventions.
I’m far from an assembly expert. In fact, I had no prior experience with ARM64 before starting this project. While online documentation is helpful, it can also be overwhelming.
One tool that helped me immensely was Compiler Explorer (Godbolt), which made it much easier to understand how high-level code maps to assembly.