in ,

RISC-V OS Using Rust: Memory Management Unit, Hacker News

This is chapter 3.2 of a multi-part series onwriting a RISC-V OS in Rust.

Table of ContentsChapter 3.1→ (Chapter 3.2) → Chapter 4

(October) : Patreon only

(October) : Public


View this post on YouTube at:


From the operating system’s point of view, we have a large pool of memory from some starting address to some ending address. We can use our linker script (thevirt.ldsfile) to provide us symbols representing the address of the starting and ending points.

Some of the memory will be used for things, such as global variables and stack space. However, we use the linker script to help us with this moving target. The rest of the untouched memory becomes our “heap” space. However, heap in the OS sense isn’t quite what it is in the malloc / free sense.


Paging is a system by which a piece of hardware (commonly referred to as the Memory Management Unit or MMU) translates virtual addresses into physical addresses. The translation is performed by reading a privileged register called the SATP or Supervisor Address Translation and Protection register. This register turns the MMU on / off, sets the address space identifier, and sets the physical memory address where the first page table can be found.

We can place these tables anywhere in RAM provided the last 12 bits are 0. This is because the last 12 bits of the page table is not provided in the SATP. In fact, to get the actual memory address, we take 44 bits from the SATP and shift it left by 12 places, which replaces the last 12 bits with 0s. This forms a 56 – bit bit address – not quite 64 – bits.

There are essentially three fields that we need to be aware of: (1) the virtual address, (2) the physical address, and (3) the page table entry. These are listed in theRISC-V privileged specification chapter 4.4.

The memory management has different modes. The HiFive Unleashed uses the Sv 39 mode, which means the virtual addresses are 39 – bits. Virtual addresses in this case cannot explore all 64 – bits of memory space. Others are Sv 32 for 32 – bit virtual addresses and Sv 48 for 48 – bit virtual addresses.

Virtual Addresses

The Sv 39 virtual address contains three 9-bit indices. Since 29=512, each table contains exaclty 512 entries, and each entry is exactly 8-bytes (64 – bits) as shown below. The Sv 39 virtual address contains VPN [x]. The VPN stands for “virtual page number”, which is essentially an index into an array of 512, 8 byte entries. So, instead of having the virtual address directly translate into a physical address, the MMU goes through a series of tables. With the Sv 39 system, we can have one to three levels. Each level contains a table of 512, 8 byte entries.

Table Entries

A table entry is written by the operating system to control how the MMU works. We can change how an address is translated or we can set certain bits to protect a page. The page entry is described in the following figure:

Physical Addresses

The physical address is actually 56 – bits. Therefore, a 39 – bit virtual address can translate into a 56 – bit physical address. Obviously, this allows us to map the same virtual address to a different physical address – much like how we will map addresses when creating user processes. The physical address is formed by taking PPN [2:0] and placing them into the following format. You will notice that the page offset is directly copied from the virtual address into the physical address.

The SATP Register

All translations begin at the Supervisor Address Translation and Protection (SATP) register shown below and is described in theRISC-V Privileged Specification chapter 4.1., which contains the following diagram of the SATP register.

We can see that the PPN (physical page number) is a 44 -bit value. However, this value has been shifted right by 12 bits before being stored. This essentially divides the actual number by 4096 (2) 12=4, 096). Therefore, the actual address isPPN.

Page Faults

The MMU signals errors in translation to the CPU through three different page faults: (1) instruction, (2) load, and (3) store. An instruction page fault occurs during the instruction fetch phase when the MMU throws a page fault. This can mean that the page entry is not valid or does not have the X (execute) bit set to 1 or some other reason.

Page faults are trapped by the CPU and themcauseorscauseregister will contain 12 for an instruction page fault, 13 for a load page fault, or 15 for a store page fault. The operating system can decide what to do then. In most cases, the correct response is to kill the offending process. However, there are some advanced systems, such as copy-on-write, that are implemented by trapping page faults. In these cases, a page fault is not a problem, it’s a feature.

Translating Virtual Addresses

Take, for example, the virtual address0x7d_beef_cafe, which is exactly 39 bits. If an address is longer than 39 – bits, the MMU will most likely cause a page fault.

In binary: 0b 0111 _ 1101 _  (_)  _ 1110 _ 1111 _ 1100 _ 1010 _ 1111 _ 1110      VPN [2] VPN [1] VPN [0] 12 - bit offset [1_1111_0110] [1_1111_0111] [0_1111_1100] [1010_1111_1110]     502 503 252

The address above will be translated through the following steps:

  1. Read SATP register and find the top of level 2’s page table ( (PPN).
  2. Add the offset * size, where offset isVPN [2]=0b1 _ 1111 _ 0110=502 * 8=SATP 4016
  3. Read this entry.
  4. If V (valid)=0, page fault
  5. If this entry’s R, W , or X bits are 1, this is a leaf, otherwise it is a branch.
  6. The PPN [2] | PPN [1] | PPN [0] address shows where in physical memory the next page table is located.
  7. Repeat at # 2 until a leaf is found.
  8. A leaf means that the PPN [2], PPN [1], and PPN [0] tells the MMU what the physical address actually is.

Every Level Can Be a Leaf

Like most memory management units, the MMU can translate a more coarse-grained page. For example, Intel / AMD x 86 _ 64 systems can stop at the PDP, which would give a 1GB resolution, whereas stopping at the PD would give a 2MB resolution. The same can be applied to the RISC-V system. However, RISC-V incorporates a clever way to determine a leaf from a branch.

On Intel / AMD machines, upper levels control the lower levels. If an upper level is read-only, then no branch after that may be writeable. RISC-V knows that this is dumb, so they made it so that if any combination of the R, W, X (read, write, execute) protection bits are set, then the page table entry describes a leaf.

If the page table entry (PTE) is a leaf, then PPN [2:0] describe the physical address. However, if the PTE is a branch, then PPN [2:0] describe where the next page table can be found in physical RAM. One thing to note is that only PPN [2:leaf-level] will be used to develop the physical address. For example, if level 2’s (the top level) page table entry is a leaf, then only PPN [2] contributes to the physical address. VPN [1] is copied to PPN [1], VPN [0] is copied to PPN [0], and the page offset is copied, as normal.

Programming the MMU

In Rust, I created three functions that will program the MMU:map,unmap, andvirt_to_phys. We need to be able to manually walk the page table because when a user-space application makes a system call, every pointer is a pointer to a virtual memory address – which is not the same as our kernel’s. Therefore, we have to drill down to a physical address so that the user-space application and the kernel are talking about the same memory.

I used a couple of structures to make programming a page table entry easier.

pub struct Table { pub entries: [Entry; 512], }  impl Table { pub fn len () ->usize { 512 } }

The first structure is aTable. This describes a single 4, 096 byte table. We get this size by having 512, 8-byte entries. A single entry is described as:

pub struct Entry { pub entry: i 64, } impl Entry { pub fn is_valid (& self) ->bool { self.get_entry () & EntryBits :: Valid.val ()!=0 }  // The first bit (bit index # 0) is the V bit for // valid. pub fn is_invalid (& self) ->bool { ! self.is_valid () }  // A leaf has one or more RWX bits set pub fn is_leaf (& self) ->bool { self.get_entry () & 0xe!=0 }  pub fn is_branch (& self) ->bool { ! self.is_leaf () }  pub fn set_entry (& mut self, entry: i 64) { self.entry=entry; }  pub fn get_entry (& self) ->i 64 { self.entry } }

Essentially, I aliased thei 64data type as anEntryso that I could add some helper functions to it.

Themapfunction takes a mutable root by-reference, a virtual address, a physical address, the protection bits, and which level this should be mapped to. Typically, we map all pages to level 0, which is the 4KiB level. However, we can specify 2 for a 1GiB page, 1 for a 2MiB page, or 0 for a 4KiB page.

pub fn map (root: & mut Table, vaddr: usize, paddr: usize, bits: i 64, level: usize) { // Make sure that Read, Write, or Execute have been provided // otherwise, we'll leak memory and always create a page fault. assert! (bits & 0xe!=0); // Extract out each VPN from the virtual address // On the virtual address, each VPN is exactly 9 bits, // which is why we use the mask 0x1ff=0b1 _ 1111 _ 1111 (9 bits) let vpn=[ // VPN [0]=vaddr [20:12] (vaddr>>12) & 0x1ff, // VPN [1]=vaddr [29:21] (vaddr>>21 ) & 0x1ff, // VPN [2]=vaddr [38:30] (vaddr>>30) & 0x1ff, ];  // Just like the virtual address, extract the physical address // numbers (PPN). However, PPN [2] is different in that it stores // 26 bits instead of 9 Therefore, we use, // 0x3ff_ffff=0b 11 _  (_)  _ 1111 _ 1111 _ 1111 _ 1111 (26 bits). let ppn=[ // PPN [0]=paddr [20:12] (paddr>>12) & 0x1ff, // PPN [1]=paddr [29:21] (paddr>>21 ) & 0x1ff, // PPN [2]=paddr [55:30] (paddr>>30) & 0x3ff_ffff, ];

The first thing we do here is break apart the virtual address and physical address into its components. Notice that we don’t care about the page offset – this is because the page offset is never stored. Instead, when the MMU translates a virtual address, it copies the page offset directly into the physical address to form a full 56 – bit physical address.

// We will use this as a floating reference so that we can set // individual entries as we walk the table. let mut v=& mut root.entries [vpn [2]]; // Now, we're going to traverse the page table and set the bits // properly. We expect the root to be valid, however we're required to // create anything beyond the root. // In Rust, we create a range iterator using the .. operator. // The .rev () will reverse the iteration since we need to start with // VPN [2] The .. operator is inclusive on start but exclusive on end. // So, (0..2) will iterate 0 and 1. for i in (level..2) .rev () { if! v.is_valid () { // Allocate a page let page=zalloc (1); // The page is already aligned by 4, 096, so store it // directly The page is stored in the entry shifted // right by 2 places. v.set_entry ( (page as i 64>>2) | EntryBits :: Valid.val (), ); } let entry=((v.get_entry () &! 0x3ff)

As you can see with the code above, we allocate a new page usingzalloc. Luckily, each page table in RISC-V is exactly 4, 096 bytes (512 entries, each 8 bytes), which is the exact size we allocate usingzalloc.

One of the challenges for my students is that if you take a look the PPN [2:0] are the same in the PTE and physical address, except they’re in different locations. For some reason the PPN [2:0] is shifted to the right by two places, and the physical address starts at bit 10. With the physical address, the physical address starts at bit 12. However, using the approach I took above, where I encode PPN by themselves, we should be safe.

Another interesting aspect about what I’ve done above is that I always encode ppn [2:0] no matter what level we’re at. This is fine for the page table entry, but it might lead to confusion. Remember, if we stop at level 1, then PPN [0] is NOT copied from the PTE. Instead, PPN [0] is copied from VPN [0]. This is performed by the hardware MMU. In the end, we’re just adding more information than might be necessary for a page table entry.

Freeing MMU Tables

We will be allocating at least one page per user-space application. However, I plan on only allowing 4KiB pages for user-space applications. Therefore, we need at least three page tables for a single address. This means that we can quickly run out of memory if we don’t reuse these pages. To free memory, we will useunmap.

pub fn unmap (root: & mut Table) { // Start with level 2 for lv2 in 0..Table :: len () { let ref entry_lv2=root.entries [lv2]; if entry_lv2.is_valid () && entry_lv2.is_branch () { // This is a valid entry, so drill down and free. let memaddr_lv1=(entry_lv2.get_entry () &! 0x3ff)

Just like with themapfunction, this function assumes a legitimate root table (level 2 table). We send it to theunmapfunction as a mutable reference. So, the logic behind this code is that we start at the lowest level (level 0) and free our way back to the first level (level 2). The code above is basically an iterative approach to a recursive function. Notice that I do not deallocate the root itself. Since we’re given a reference of a genericTablestructure, we could pass a level 1 table just to free portions of a larger table.

Walking The Page Tables

Finally, we need to manually walk the page table. This will be used to copy data from a virtual memory address. Since the virtual memory addresses are different between the kernel and user processes, we need to talk in physical addresses. The only way to do this is to always convert virtual memory addresses into their physical address equivalent. Unlike the hardware MMU, we can pass any table to this function regardless of whether that table is currently employed by the MMU.

pub fn virt_to_phys (root: & Table, vaddr: usize) ->Option{ // Walk the page table pointed to by root let vpn=[ // VPN [0]=vaddr [20:12] (vaddr>>12) & 0x1ff, // VPN [1]=vaddr [29:21] (vaddr>>21 ) & 0x1ff, // VPN [2]=vaddr [38:30] (vaddr>>30) & 0x1ff, ];  let mut v=& root.entries [vpn [2]]; for i in (0 ..=2) .rev () { if v.is_invalid () { // This is an invalid entry, page fault. break; } else if v.is_leaf () { // According to RISC-V, a leaf can be at any level.  // The offset mask masks off the PPN. Each PPN is 9 // bits and they start at bit # 12. So, our formula // 12   i * 9 let off_mask=(1

As you can see with the Rust function above, we are given a constant reference to a page table. We return anOption, which is an enumeration with eitherNone, which we will use to signal a page fault, orSome (), which we will use to return the physical address.

There are several reasons to page fault using the RISC-V memory management unit. One is to encounter an invalid page table entry (V bit is 0). Another is to have a branch at level 0. Sinc e level 0 is the lowest level, a page fault occurs if we signal there is an even lower level (I guess -1 ??).

The A and D Bits

There is one more reason for a page fault – which someone might have to explain to me as to why this was made as a design decision. If you read the RISC-V privileged spec, the D (dirty) and A (accessed) bits can have two different meanings.

The more traditional meaning is that the CPU will set the A bit to 1 any time that memory is read from or written to, and the CPU will set the D bit to 1 any time that memory is written to. However, the RISC-V specification also allows the hardware to essentially use these as redundant R and W bits – at least that’s my understanding.

The HiFive Unleashed board uses the latter and will throw a page fault if the D or A bits are 0 on a page table entry. So, let’s examine these individually:

  1. the MMU will throw a page fault if you attempt to write a page whose D bit is 0. Wait, isn’t this what the W bit is for?
  2. the MMU will throw a page fault if you attempt to read a page whose A bit is 0. Wait, isn’t this what the R bit is for?
RISC-V Privileged Spec. Chapter 4.3.1

Obviously, I wasn’t consulted when this was kicked around, and I haven’t done my due diligence to look up potential reasons. So, I risk embarrassment to preserve my time :(.

Mapping the Kernel

We are going to switch to Supervisor mode, which allows us to use the MMU for kernel memory. However, we first need to map everything that we need. In RISC-V, the machine mode uses physical memory addresses. However, if we switch the MMU on (set MODE field to 8), then we can use the MMU in supervisor or user mode.

To do this, we are going to identity map (virtual address=physical address) everything we need in the kernel, to include the program code, global sections, UART MMIO address, and so forth. First, I wrote a function in Rust that will help me identity map a range of addresses.

pub fn id_map_range (root: & mut page :: Table, start: usize, end: usize, bits: i 64) { let mut memaddr=start &! (page :: PAGE_SIZE - 1); let num_kb_pages=(page :: align_val (end, 12) - memaddr) / page :: PAGE_SIZE;  // I named this num_kb_pages for future expansion when // I decide to allow for GiB (2 ^ 30) and 2MiB (2 ^ 21) page // sizes. However, the overlapping memory regions are causing // nightmares. for _ in 0..num_kb_pages { page :: map (root, memaddr, memaddr, bits, 0); memaddr =1

As you can see, this function calls ourpage :: mapfunction, which maps a virtual address to a physical address. We are given the root table as a mutable reference, which we can pass directly to the map function.

Now, we need to use this function to map our program code and all MMIOs we will need for device drivers, later.

page :: init (); kmem :: init ();  // Map heap allocations let root_ptr=kmem :: get_page_table (); let root_u=root_ptr as usize; let mut root=unsafe {root_ptr.as_mut (). unwrap ()}; let kheap_head=kmem :: get_head () as usize; let total_pages=kmem :: get_num_allocations (); println! (); println! (); unsafe { println! ("TEXT: 0x {: x} ->0x {: x}", TEXT_START, TEXT_END); println! ("RODATA: 0x {: x} ->0x {: x}", RODATA_START, RODATA_END); println! ("DATA: 0x {: x} ->0x {: x}", DATA_START, DATA_END); println! ("BSS: 0x {: x} ->0x {: x}", BSS_START, BSS_END); println! ("STACK: 0x {: x} ->0x {: x}", KERNEL_STACK_START, KERNEL_STACK_END); println! ("HEAP: 0x {: x} ->0x {: x}", kheap_head, kheap_head   total_pages * 4096); } id_map_range ( & mut root, kheap_head, kheap_head   total_pages * 4096, page :: EntryBits :: ReadWrite.val (), ); unsafe { // Map heap descriptors let num_pages=HEAP_SIZE / page :: PAGE_SIZE; id_map_range (& mut root, HEAP_START, HEAP_START   num_pages, page :: EntryBits :: ReadWrite.val () ); // Map executable section id_map_range ( & mut root, TEXT_START, TEXT_END, page :: EntryBits :: ReadExecute.val (), ); // Map rodata section // We put the ROdata section into the text section, so they can // potentially overlap however, we only care that it's read // only. id_map_range ( & mut root, RODATA_START, RODATA_END, page :: EntryBits :: ReadExecute.val (), ); // Map data section id_map_range ( & mut root, DATA_START, DATA_END, page :: EntryBits :: ReadWrite.val (), ); // Map bss section id_map_range ( & mut root, BSS_START, BSS_END, page :: EntryBits :: ReadWrite.val (), ); // Map kernel stack id_map_range ( & mut root, KERNEL_STACK_START, KERNEL_STACK_END, page :: EntryBits :: ReadWrite.val (), ); }  // UART page :: map ( & mut root, 0x  (_) , 0x  (_) , page :: EntryBits :: ReadWrite.val (), 0 );

This a lot of code, which is why I wroteid_map_range. In this code, we first initialize the page allocation system we created back in chapter 3.1. Then, we allocate a root table, which takes exactly 1 page (4096 bytes). The point of type casting this memory address back and forth is because we will eventually need to write a physical address into the SATP (Supervisor Address Translation and Protection) register to get the MMU going.

Now that we have theroottable, we need to stick this into the PPN field of the SATP register. The PPN is essentially the upper 44 bits of a 56 – bit physical address of the table. What we do is shift the root table’s address to the right by 12 place, which essentially divides the address by a page size.

let root_ppn=root_u>>12; let satp_val=8

The code above uses theasm!macro, which allows us to write raw assembly using Rust. In here, we UseCSRW, which means “Control and Status Register – Write”. The8 puts the value 8 in the MODE field of the SATP register, which is the Sv 39 scheme.

Switching The MMU On

Just because we wrote 8 in the MMU mode, we don’t quite have the MMU turned on. The reason is because we’re in CPU mode # 3, which is the machine mode. So, we need to switch to “supervisor” mode, which is mode # 1. We do this by writing the binary number 01 in the MPP (Machine Previous Privilege) field in the (mstatus) ****************************** (register) bits 11 and 12).

li t0, (1

What we do above is enable interrupts and put the value 1 starting at bit 11 (MPP=01). When we executemret, we go to a Rust function calledkmain. We are now in supervisor mode with the MMU fully turned on.

We have not handled any page faults, yet, so be careful! If you go outside of your address space, your kernel will hang.

Chapter 3.1→ (Chapter 3.2) → Chapter 4

Brave Browser
Read More

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Jimmy Butler & the Heat Suddenly Look Like Dark-Horse Title Contenders, Crypto Coins News

Jimmy Butler & the Heat Suddenly Look Like Dark-Horse Title Contenders, Crypto Coins News

DirecTV kept charging regional sports fee while channel was blacked out, Ars Technica

DirecTV kept charging regional sports fee while channel was blacked out, Ars Technica