How to make a compiler? As an exercise I have already threatened to produce at least some assembly code for the Nut CPU used in the HP-41 calculator. There will be huge compromises (it is after all just an exercise):
- I will copy and strip down an existing compiler backend in the LLVM project, then build some small parts up again with Nut code and new naming. This is also what the LLVM documentation recommends. I will start with Sparc as they suggest. To connect the new backend with clang all the Sparc stuff in that subproject must also be copied and renamed. This is actually not a small task, it takes like a week or so to do all of this and to get everything up building and running again…
- I will not care about assembler directives in the source code, I will at least for now just stay with the format inherited from the compiler backend I took as a starting point.
- I will not care much about about the calculator itself, just the CPU. Although I technically could produce something that could be assembled with the A41 assembler (see the SDK documentation) and run as a proper calculator function in the corresponding emulator, this is just too cumbersome at least from the very start. I do not have the hardware needed to run my own MCODE in the physical machine… I will just produce some code for a hypothetical standalone CPU running in hex mode. Actual calculator registers contain floating point numbers encoded as BCD…
So what does it take? We should probably start by defining some registers:
//===-- HP41MCODERegisterInfo.td - HP41MCODE Register defs -*- tablegen -*-===// // // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // See https://llvm.org/LICENSE.txt for license information. // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // Declarations that describe the HP41MCODE register file //===----------------------------------------------------------------------===// class HP41MCODEReg<bits<16> Enc, string n> : Register { let HWEncoding = Enc; let Namespace = "HP41MCODE"; } def A : HP41MCODEReg< 1, "A">; def B : HP41MCODEReg< 2, "B">; def C : HP41MCODEReg< 3, "C">; def R56 : RegisterClass<"HP41MCODE", [i32], 8, (add A, B, C )>;
For simplicity in this exercise I will only use 32 of the 56 bits in the registers, to make it easier for modern compiler infrastructure…
But what is the simplest function we can generate code for? It must be this:
void foo() { return; }
For this we just need a return instruction. Some code snippets:
... namespace HP41MCODEISD { enum NodeType : unsigned { ... RTN, // A return instruction. ... }; } ... SDValue HP41MCODETargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool IsVarArg, const SmallVectorImpl &Outs, const SmallVectorImpl &OutVals, const SDLoc &DL, SelectionDAG &DAG) const { ... return DAG.getNode(HP41MCODEISD::RTN, DL, MVT::Other, RetOps); } const char *HP41MCODETargetLowering::getTargetNodeName( unsigned Opcode) const { switch ((HP41MCODEISD::NodeType)Opcode) { ... case HP41MCODEISD::RTN: return "HP41MCODEISD::RTN"; ... } return nullptr; } ... def HP41MCODERTN : SDNode<"HP41MCODEISD::RTN", SDTNone, [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>; ... let isReturn=1, isTerminator=1, isBarrier=1 in { def RTN : HP41MCODEInst<0x3E0, (outs), (ins), "RTN", [(HP41MCODERTN)]>; } ...
So what does that produce? A function that just returns, but in MCODE assembly… I realize this is a very small step for humanity, but it is a giant leap for someone just starting to learn about LLVM:
.file "hello.c" .text .globl foo ! -- Begin function foo .type foo,@function foo: ! @foo ! %bb.0: ! %entry RTN .Lfunc_end0: .size foo, .Lfunc_end0-foo ! -- End function .ident "clang version 20.0.0git (https://github.com/llvm/llvm-project.git 1a75416092746b127fb7cfec3fcbe37ab765da58)" .section ".note.GNU-stack" .addrsig