Writing a C Compiler
March 18, 2026 - Dallas McNeil - Programming Languages
Over the past few months I have been writing a C compiler, following along with the book Writing a C Compiler by Nora Sandler.
TL;DR, I think this is a fantastic book and learning opportunity, but it is fairly advanced and a significant undertaking to follow the entire way. For now at least, I only implemented half of the contents.
Why?
Compilers have always interested me and are a fundamental technology we tend to gloss over as programmers. Everything we write is first run through a compiler and it fully dictates the actual program produced. I wanted to better understand how they work and how code translates to assembly. Maybe one day I’ll make my own language or at least contribute to one.
There are a lot of great compiler/interpreter learning resources out there (such as Crafting Interpreters) which I would generally recommend over Writing a C Compiler. But I had already done some compiler basics during university and understood some core concepts. I wanted something more realistic and challenging.
The appeal of a C compiler is that it’s a real language. It’s probably the most important programming language of all time. I’ve written a lot of it so I already know how the input language works. It has quirks that mean it isn’t smooth sailing the entire way either. It’s not designed to be easy to compile, for better or worse.
What is in the book?
Writing a C Compiler is broken into three parts:
- Part 1, Basics. This encompasses many things that can be done by a C compiler with just the int type, like operators, control flow, functions.
- Part 2, Types Beyond Int. This is everything else that requires other types like longs, doubles, pointers, structs, arrays.
- Part 3, Optimisations. This covers passes to make the compiled output somewhat performant.
Each chapter covers implementing some features of the C language. The book comes with a test suite to run your compiler against, to check it implements the C spec to some degree of correctness.
It’s also worth noting that it only covers compilation, not preprocessing, assembling and linking. The input to the compiler is a preprocessed .i and the output is an assembly .s file. I chose to only implement support for x86-64 Linux as covered in the book.
Where I got to
I implemented the first 14 chapters of the book, which include:
- Variables, including static and extern
- Ints, longs, signed and unsigned types
- Doubles
- All control flow statements
- All math operators
- Function calls
- Pointers
I have not implemented:
- Arrays
- String constants, chars
- void pointers, dynamic memory allocation
- Structs, unions
- The myriad of other features not covered in the book
I also have not implemented any optimisation techniques, which makes the resulting code very unoptimised.
With these features, my compiler can compile some reasonably complicated programs. This program prints the first 80 numbers in the Fibonacci sequence.
int putchar(int character);
unsigned long fibonacci(unsigned long num) {
unsigned long prev1 = 1;
unsigned long prev2 = 0;
for (unsigned long i = 1; i < num; i++) {
unsigned long current = prev1 + prev2;
prev2 = prev1;
prev1 = current;
}
return prev1;
}
int print_num(unsigned long num) {
int has_printed = 0;
unsigned long divider = 10000000000000000000;
while (divider > 0) {
int digit = (num / divider) % 10;
divider /= 10;
if (has_printed || digit != 0) {
putchar(digit + 48);
has_printed = 1;
}
}
putchar(10);
}
int main(void) {
for (int i=1; i<80; i++) {
print_num(fibonacci(i));
}
}
There are plenty of shenanigans though:
- I need to declare
putcharas includingstdio.hwould have parts of the C language the compiler can’t handle - I use
putcharand my own int printing function because I don’t support arrays, strings or chars
What I liked
I thoroughly enjoyed implementing part 1 of the book. Part 1 (chapters 1-10) focuses on many fundamental language features without any types. There was a lot of new information I found valuable. Each chapter implemented a significant new feature and felt manageable to achieve over a couple of nights. It was just in the sweet spot for challenge.
The book always lays out the data structures between each stage in the compiler. Setting those firm pillars is important to ground the entire compiler and ensure any incorrect structures don’t ripple through the following stages.
I want to give special praise to the accompanying test suite, which tests not only the final output of the compiler, but the different stages as you work through the chapter. There were always manageable goals to aim for, rather than an all or nothing check at the end of the chapter.
Throughout this part, the reader is trusted to fill in some small gaps. Not everything is explicitly spelt out in pseudocode. Particularly the extra credit sections at the end of some chapters challenge the reader to implement an additional feature. I found these were never too far of a stretch. Tests and the C spec are your main guide though, but it’s rewarding to implement something with less hand-holding. However, choosing to implement them is compounding work. Future chapters build on extra credit features and require additional work. For example, choosing to implement bitwise operators makes implementing other features like compound assignments more work, also without any hand-holding.
Why did I stop?
Simply put, it doesn’t feel like the juice is worth the squeeze where I am up to now. While part 1 went quickly, part 2 (chapters 11-18) introduces types and is much more nuanced. Chapters took longer to implement and much longer to debug.
A lot of the work becomes modifying existing code. Without a clear picture of what you are building, you often make choices in how you structure your types and code to just meet the requirements at hand. Then when a new requirement comes along, it catches you off guard and you have to make sweeping changes to your code to support it. This felt like it happened multiple times in part 2. This was also likely compounded by using Rust. I ended up having to make multiple refactors to move forward, which often involved commenting and uncommenting large parts of my codebase. This felt like more busy work than learning. It’s difficult to avoid this in any software project and building that clear picture of how the code should look at the end is the reward of going through this process of discovery before. That’s why prototypes are so valuable.
In fairness, the book does its best to telegraph and prepare you for these major adjustments. There are many decisions in earlier chapters that feel overly complicated that are essential for later chapters.
There are also fewer instances of pseudocode provided and more times where the reader is asked to fill in the gap. Often these areas were the ones where bugs crept in for me. Bug fixing can be educational, sometimes with the hardest won lessons, but these did not feel like the case.
My only technical criticism for the book is its lexer implementation. Writing a C Compiler opts for regular expressions. But lexing is the easiest stage of a compiler and a handwritten lexer is easy to write and performant.
Conclusion
I would recommend Writing a C Compiler if you want a great read and/or a challenging project. I would not recommend it for beginners. I would encourage anyone who is mildly interested to try implementing at least the first few chapters. Doing is always better than just reading and with some understanding of the structure of the compiler, I think reading the rest will be more educational.
From here, I think I’ll enjoy just reading the remainder of the book. I might like to dust this compiler off and finish part 2, or go straight to part 3 some day.
In some ways, I have a newfound appreciation for C, and also a new dread for some quirks of the language and x86-64 assembly.
I’ll sign off with the biggest surprise I encountered writing my C compiler. Below is the optimized x86-64 assembly required to convert an unsigned long to a double (u64 to f64), which includes a jump! Something to think about next time you make this conversion in your C code.
double_to_ulong:
test rdi, rdi
js .L2
pxor xmm0, xmm0
cvtsi2sd xmm0, rdi
ret
.L2:
mov rax, rdi
and edi, 1
pxor xmm0, xmm0
shr rax
or rax, rdi
cvtsi2sd xmm0, rax
addsd xmm0, xmm0
ret