Last edited on 20231121.
A bootloader for CHERIoT
My previous notes on CHERIoT explained how to run programs on the Arty board. Unfortunately, testing different programs required synthetizing again the bitfiles each time, which could take more than 10 minutes.
Fortunately, David Chisnall has now provided a small bootloader that can load a hex file over UART.
These notes try to describe what the code does from my understanding.
Below, I reproduce the boot.S
code with my comments.
.include "assembly-helpers.s"
Directive for including assembly-helpers.s
which contains multiple useful macros.
.section .text, "ax", @progbits
.zero 0x80
.globl start
.p2align 2
.type start,@function
Declare a .text
section which is read-only and executable. The section starts
with 0x80
(128) bytes initialized to zero and makes the symbol start
global
in the symbol table. .p2align 2
aligns the current section to 2^2 = 4 bytes
(32 bits) alignment. Finally, the last directive marks the symbol start
as
being a function name.
start:
// ca0 (first argument) contains the read-write root
cspecialr ca0, mtdc
cspecialr cd, scr
is an alias for cspecialrw cd, scr, c0
. Thus, this instruction simply reads the register mtdc
and copies its value in register ca0
, assuming that the PCC has
permit_access_system_registers
enabled (which holds at reset, since PCC contains
the executable root capability). mtdc
is a special capability register
that stands for machine trap data capability. More importantly, it contains the
memory root capability at reset. CHERIoT extends RISC-V's general
purpose registers to 65 bit length (64 bits + 1 bit tag). The registers are
referred to as c0, ..., c15
, while their integer (address) parts retain the
standard x0, ..., x15
name. The RV32E ABI register names are reused, thus
ca0
(a0
) corresponds to capability register c10
(integer register x10
).
// Zero the tag memory
li a1, 0x200fe000
csetaddr ca0, ca0, a1
li a1, 0x20100000
cjal zero_memory
li
is a pseudo-instruction that stands for Load Immediate, thus, this sequence
of code writes the immediate 0x200fe000
into register a1
, and sets the address
of the capability contained in ca0
to it. 0x20100000
is then written to a1
,
and the routine zero_memory
is called. Addresses 0x200fe000
to 0x20100000
correspond to the "shadow" memory used for the temporal safety mechanism
and are zeroed by the routine.
// No bounds on stack, grows down from the end of IRAM
li sp, 0x20080000
csetaddr csp, ca0, sp
auipcc cra, 0
We land back here when zero_memory
returns.
This loads 0x20080000
which is the end address of IRAM into the stack pointer
(csp.address
). Finally, the current PCC is saved into the capability return
address register cra
.
// Call the C++ entry point
la_abs t0, rom_loader_entry
csetaddr cra, cra, t0
cjalr cra
la_abs
is one of the macro available in assembly-helpers.s
and loads the absolute address of the symbol. Hence, the first instruction will
load the address of rom_loader_entry
(from boot.cc
)
into register t0
(i.e., x5
). The address is then used to modify the capability
in cra
and is then jumped to. More precisely, cjalr cra
expands into cjalr cra, cra
which seals the return address into cra
, and replaces the PCC with the given
capability. rom_loader_entry
takes only one argument which is by convention
contained in ca0
. The capability was originally from mtdc
and has all the
necessary permissions and bounds. Its address is not used, and is reset by
rom_loader_entry
.
// Zero all of the memory that we haven't loaded into.
cspecialr ca1, mtdc
csetaddr ca0, ca1, a0
li a1, 0x20080000
cjal zero_memory
After returning, a0
contains the end address of where code has been loaded into,
and memory is zeroed from that address until 0x20080000
which is the end address
of the IRAM.
// Jump to the newly loaded binary.
// This could be a relative jump, but I'd need to get the relocations right
// and we have 32 KiB of IROM so wasting a few bytes doesn't really matter.
auipcc cra, 0
li t0, 0x20040000
csetaddr cra, cra, t0
cjr cra
This sets an executable root capability with address 0x20040000
which is where
roam_loader_entry
started writing the loaded code. cjr cra
expands into
cjalr cnull, cra
, that is, the return address is ignored and only a jump is
executed.
zero_memory:
csw zero, 0(ca0)
cincoffset ca0, ca0, 4
blt a0, a1, zero_memory;
cret
The routine writes a zero at ca0.address
in memory and increments a0
by 4
bytes (32 bits). If a0
is lesser than a1
then it continues zeroing the memory,
otherwise it returns. Thus, the routine zeroes all memory between a0
(ca0.address
)
and a1
. This zeroes memory 32 bits by 32 bits, but I wonder if it wouldn't be
faster to zero 64 bits by 64 bits using csc c0, 0(ca0)
instead, since, as far
as I understand, the null capability is untagged, so this would also remove tags
from memory (which are not guaranteed to be zeroed at reset).