Rewriting M4vgalib in Rust, Hacker News

**************************************************************************************** – –

Executive summary Motivation

Rust has a package manager

(on no_std) *******************

On binary size

memory safety Bounds checks

On Safety from data races

If this isn. ‘t your first time visiting my blog, you may recall that I’ve spent the past several years building an elaborate microcontroller graphics demousing C .

Over the past few months, I’ve been rewriting it – in (Rust) ************.

This is an interesting test case for Rust, because we’re very much in C / C ‘s home court here: the demo runs on the bare metal, without an operating system, and is very sensitive to both CPU timing and memory usage.

The results so far?The Rust implementation is simpler, shorter (in lines of code), faster, and smaller (in bytes of Flash)than my heavily-optimized C version – and because it’s almost entirely safe code,several types of bugs that I fought regularly, such as race conditions and dangling pointers, are now caughtby the compiler.

It’s fantastic. Read on for my notes on the process.

The Rust tools and library ecosystem are fantastic. Simply having a package manager is an incredibly important advance.

When I am writing C , I’m thinking about undefined behavior and bugs the compiler won’t catch. When I’m writing Rust, I’m thinking instead about how to optimize things or add features. There is a very realcognitive load difference and it makes me more productive.

Rust’s safety features, such as bounds checking, have caught bugs and have not caused performance problems. (With one exception, discussed below; in that case the fix was simple.)

The port revealedsignificant subtle bugs in the C codewhen the Rust compiler wouldn’t let me do certain things … that turned out to be legitimately incorrect things to do.I wrotem4vgalib

and theattendant demos

as an exercise in hard-real-time programming. I wanted to see how far I could push C , so I avoided assembly language everywhere except certain routines that weren’t possible without it.

Now, given myfeelings about C

, I want to see how far I can push Rust – specifically, safe Rust. See, despite having written C as my day job for many years, I’m aware that most of the common security / reliability bugs we see in software today are a result of flaws in the C and C languages. Rust fixesessentially all of these flaws.So I’ve been keeping an eye on it for a while. More reliable software with less work?Yes please.

My graphics demos are so resource-constrained, and so timing-sensitive, that they fall squarely into the traditional domain of assembly and C – a domain that has been well-defended for years. Can I build the same thing using a memory-safe language? Could I use the additional brain-space that I’m notspending on remembering C ‘s initialization order rules (for example) to make a better system with more features?

The answer so far seems to be yes.

This is a late addition to my notes, because I’ve gotten so used to it in languages like Rust, Haskell, Python … even JavaScript … that I had forgotten what a giant thing this is.

Programming languages fall into two categories: those that were designed before the advent of modern package managers, and those designed after. There’s a very important difference between these two categories:

Pre-package-manager languages try to have an everything-plus-the-kitchen-sink standard library, and developers tend to avoid third-party libraries.

Post-package-manager languages go for a more minimal standard library, and Developers are accustomed to extending it with packages.

In particular, C falls in the first category, and Rust falls in the second.Adding a new third-party dependency to a C project is, at best, a giant pain. It might not support your build system, you’re probably going to have to figure out some include-path magic, and it’s almost certainly not built with the same linter settings. At worst, it might be using an incompatible subset of core language features, like exceptions vs. not. C programmers (myself included) tend to avoid such dependencies at all costs, up to and including rewriting everything themselves.( (Guilty.) ) This is a huge drain on productivity and creates new bugs each time.For contrast: there came a moment when porting my Conway’s Game of Life implementation when I needed a random number generator, to seed the playing field. Rust’s standard library doesn’t contain a random number generator.Instead, I added a dependency on (the****************************************** (crate) , let it automatically download and build, and continued programming.I see the C community watching Rust and trying to adapt our best features, but imagine what a standardized C build system and package manager would do for the language.

For bare-metal programming specifically, thetruly killer feature of Rust is the the___dd (ecosystem.)

C has a monolithic standard library with an amazing set of cool stuff in it (because, as I noted in the last section, of when it was written). However, the library embeds some important assumptions. In particular, it is written for a “normal” C execution environment, which for our purposes means two things:(***************** (There is a heap, and it’s okay to allocate / free whenever.

Exceptions are turned on.
In most high-reliability, hard-real-time embedded environments, neither of these statements is true. We eschew heaps because of the potential for exhaustion and fragmentation; we eschew exceptions because the performance of unwinding code is unpredictable and vendor-dependent.Now, there arepartsof the C standard library that you can use safely in a no-heap, no-exceptions environment. Header-only libraries like type_traitsare probably fine. Simple primitive types like atomic are … probably fine?

I keep saying “probably” because the no-heap, no-exception subset of the C standard isis not clearly defined. (The C standards folk have, in fact, resisted doing this, arguing that it would fragment the language; this ship has most definitely sailed.) As a result, it’s really easy to accidentallyintroduce a heap dependency, or toaccidentallyuse an API that can’t indicate failure when exceptions are disabled (likestd :: vector :: push_back).

The Rust standard library has a critical difference: it’s divided into two parts, std and (core [i] . std is like the C equivalent. core

****************************************, on the other hand, is how std itself is implemented, and doesn’t assume the existence of things like “the heap,” threads, and the like. While code depends onstd by default, you can set an attribute,no_std , to request only core (********************************************.This is a tiny design decision with huge implications:(*****************By setting the (# [no_std] attribute on a crate, you’re opting out of the default dependency on (std) . Any attempt to use a feature from std is now a compile time error – but you can still usecoreYou can trustothercrates to do the same, so you can use third-party libraries safely if they, too, are

no_std. Many crates are either no_std by default, or can have it enabled at build time.coreis small enough that porting it to a new platform is easy - significantly easier, in fact, than portingnewlib, the standard-bearer for Portable embedded C libraries.(For) m4vgalibI rewrote almost all my dependencies to get a system that wouldn't throw or allocate. In Rust, I don't have to do that!Rust's ownership rules produce a sort of bizarro-world of API design.
Some (uncommon, but reasonable) API designs won't make it past the borrowing checker. (In nearly every case, these are APIs that wereeasy to use incorrectlyin other languages.

Some API patterns that are grossly unsafe or unwise in other languages are routine in Rust because of lifetime checking.

As an example of the latter: it is common, and safe, to loan out stack-allocated data structures to other threads with no runtime checks. (See: scoped threads in crossbeam

Another: it is normal in Rust text-processing code to deal in & str , which is equivalent to a C string_view (******************************************. Storing a string_view in C (say, in the heap) is an incredibly bad idea, because It’s easy for it to become a dangling pointer; C programs resort to defensive copying to avoid this. On the other hand, Rust programs routinely store& str, copying only when the borrow checker can't prove that the code is correct.When this is working well, it can cause abstractions and complexity to dissolve.


Concrete example: (m4vgalib) (C ) lets applications provide custom rasterizers
************************** that is invoked to generate pixel data. They are subclasses of the the Rasterizer library class, which sports a single virtual member function (called - wait for it -rasterize) . You register a (Rasterizer) with the driver by putting a pointer to it into a table. Once registered, the Rasterizer will have its (rasterize) function called from an interrupt handler once per scanline.

You, the application author, have some responsibilities to use this API safely:(*****************
The (Rasterizer) object needs to hang around until you're done with it - it might be static or it might be allocated from a carefully-managed arena. Otherwise, the ISR will try to use dangling pointers, and that's bad.

While the (Rasterizer) object is accessible by the ISR, it can be entered at basically any time by code running at interrupt priority. Because we can’t disable interrupts without distorting the display, this means that your application code that shares state with theRasterizer ( say, a drawing loop) needs to be written carefully to avoid data races. Commonly, this means double-buffering with a std :: atomic

flip signal ... and some manually-inserted barriers ... and some squinting and care to avoid accessing other state incorrectly.

Before disposing of the (Rasterizer) object, you must un-register it with the driver. This prevents an ISR from dereferencing its dangling pointer, which, again, would be bad.

I recreated the C API verbatim in Rust, and immediately started to run into ownership issues. My internal monologue went something like this:

“Okay, here’s a (Raster) trait and an implementation thereof. ”

“Hm. How can I pass a reference to this to an interrupt handler? In C I stuffed a pointer into a global variable, but Rust’s the rules around (static) state seem to prevent that. ““Okay, I’ve built an abstraction (****************** IRef ) to enable that to be done safely; only it turns out I didn’t actually want to giveof the (Rasterizer) ******************************************* to the ISR, because I want to draw into its background buffer and make other state changes. It needs to be shared with the main rendering loop! ““If I split the (Rasterizer) ********************************************* into two parts and giveoneto the ISR, how do I communicate between them when it comes time to flip buffers? Do I need to pepper my code with Cell to do interior mutability ? ““This feels a lot like the problem that scoped threads

solves.“… heeeeey, how do they implement this?”Taking inspiration fromcrossbeam, I added code to loan a closure (***********************************, rather than an object, to the ISR. Closures are fundamentally different, from an API perspective, because they can capture local state easily – and that capture is visible to the borrow checker, to avoid races or dangling pointers.In the end, the Rust API wound up being (very) ********************************** different: there is no Rasterizer trait, and there are no rasterizers. There are only functions. This makes new effectsmuch Easier to write. For example, this one draws a red line that sweeps down the display at pixels / second:

let

letred_line=SpinLock :: new (Wrapping (

)0)); vga.with_raster(

// The raster callback is invoked on every horizontal retrace to     // provide new pixels. It runs in interrupt context.     line, tgt, ctx

|{************ {         ()line==*red_line.

lock()%********************************************************************{********** {             fill)(tgt,RED;         }else{             fill)(tgt,BLACK);         }     },     *** The scope callback is executed to run application logic. As soon as     // it returns, the raster callback is revoked from the ISR, so we know     // that state is no longer shared with interrupts.     vga)loop)        vga.************** sync_to_vblank();         red_line.lock_mut()=1;     })

This makes the problem of sharing state trivial: have the state in scope when you declare these closures, and share it using normal Rust techniques.

In addition to being easier to use, this API is also much harder to (misuse) : It’s essentially impossible to accidentally introduce a data race. This is because the raster callback is required to beSend, meaning it can safely be transferred across threads (or, here, to an interrupt handler, which is like a second thread). If the closure had captured some state that isnt thread-safe, like a simple (mut) ****************************************** (local variable or a) **************************************** (Cell) , it is a compile error. ( SpinLockin the code above is thread- safe.)As of C 20, C has closures with captures. You could almost implement this same API in m4vgalib. But I wouldn't, because ...************** It wouldn't be robust.Capturing stack structures by reference creates a real risk that you'll accidentally leak the reference into a larger scope, e.g. by storing it in a global or member field of a long-lived object. Plus, C 's type system does not have any notion of thread-safety, so nothing would stop you from sharing a non-threadsafe structure with the ISR. It's allfootguns
. **************
************** It might require allocations.In Rust, the ISR invokes the closure generically through the (FnMut) ************************************ trait that all closures implement . In C , there is no direct equivalent; closures do not have vtables, but must be wrapped in a heap-allocated (std :: function) to be used dynamically. (In Rust, closures also do not have vtables, because we don't do virtual dispatch the same way. That's a longer story.) Rust has a reputation for producing larger binaries than C . This reputation appears to be undeserved.If you run a release build of one of the demos and run (size) ******************************************, you will find binaries that are larger than their C equivalents. For example, here's a comparison of horiz_tp (written in each language:)text data bss dec hex filename************************************************************** (******************************************************** (2cf) cpp / horiz_tp314 (************************************************************** f6 rust / horiz_tp************** This comparison is misleading.The C codebase goes to some length to avoid including extraneous material in Flash - in particular, it compiles out all assert messages. Rust, on the other hand, is built with support for stack unwinding and panic messages. (Why? Because Rust came with support for funneling those messages over JTAG and into my debugger through the processor's ITM block. C had no such support, so I did not waste the Flash.) But this means each binary contains all the panic strings, plus all the message formatting code. If you would like to produce smaller binaries, and are willing to sacrifice panic messages, you need to build with a different feature set:$ cargo build --release --no-default-features --features panic-haltIn this mode, the binaries are much smaller:text data bss dec hex filename**************************************************************** 180796 (****************************************************************** 2d3f2 horiz_tp**************************************************************** 180860 (******************************************************************** 2d3d8 xor_pattern************************************************************** 314 (******************************************************** (2da) ************************************************************************************** conway In fact,the binaries are 3-9% smaller than in C ,despite compiling the C with - Os and the Rust with (the equivalent of) - O3 (********************************************. Size has not been a issue for this project.I'm currently using (unsafe) ******************************************** (in) places. None of them are for Rust-specific performance reasons.(I say "Rust-specific" because some of them are calling into assembly routines, which definitely exist for performance reasons, but are identical in C .) The majority of (unsafe) code (instancesis related to a class of API deficits in the (stm) **************************************************************************************** (f4) *********************************** device interface crate I'm using. It treats any field in a register for which it does not have defined valid bit patterns as possibly unsafe ... and then fails to define most of the register fields I'm using. Not sure why. I imagine this can be fixed. (I've already upstreamed part of the fix.) After that, the leading causes are situations that areinherently unsafe. In these cases the right solution is to wrap the code in a neat, safe API (and I have): 5 cases: Getting exclusive references to shared mutable global data , which is super racy unless you're careful. 4 cases: Calling into assembly code, which can do literally whatever it wants and so must be handled carefully. 4 cases: Managing the DMA controller, which is basically a peripheral for doing unsafe memory things. 3 cases: Implementing custom mutex-like types. 2 cases: Setting up the CPU and hardware environment. 2 cases: Doing something scary withcore :: mem :: transmute to implement an inter-thread reference sharing primitive: These are the reason (unsafe) ******************************************** (exists: so that I can do these things without having to change languages or use assembler. (Note that unsafe Rust is still a more featureful place than safe C.) This leaves two (unsafe) Uses that can likely be fixed: (Taking a very lazy shortcut with) ************************************ (core :: mem :: transmutethat can probably be improved.Deliberately aliasing a [u32] (as) ******************************************, which is memory-safe but endian-sensitive. If you wanted to check every potential source of memory and data race bugs in the Rust codebase, you would need to review these 35 locations; you can find them all trivially using grep (******************************************. To perform the same review inm4vgalib, you would be reading **********************************************************************************, ************************************************************************************** lines of unsafe code.That is, every C statement that I wrote. I can't bring up memory safety without someone taking a potshot at Rust's bounds checking for arrays. Since m4vga demands pretty high performance, I ' ve been auditing the machine code produced by (rustc) In the performance critical parts of the code, bounds checks were eitheralready eliminated at compile time,or could be eliminated by a simple refactoring of the code.The demos spend effectively no time evaluating bounds checks. There are two relevant patterns in the current code. First: in Rust, we can pass a fixed-length array by referencewithout it degrading into a pointer as it does in C.For instance ,fnget_element_3(array : [u8; 10]) ->************     // This bounds check is trivially proven and will not be     // performed at runtime.array [3] }// This attempt to pass a 2-element array is a compile. // error.get_element_3([0;2]); Neither of those statements holds in C. As a result, we use fixed-length arrays in several places in the demo where we did in C . Second, if the length of an array is known (to us, the programmer) but not known ************************ (to the poor compiler), we can hoist bounds checks to a convenient place. For instance, this routine as written performs runtime bounds checks at each loop iteration:// Note that the array is a slice of runtime-determined length.************ fnfill _ 823(array:mut[u8],********** color:u8 ) {     )i************ (in(0) ..        array [i] =color;     } } We can check the length outside the loop, and make the length visible to the compiler, like this:fnfill _ (array: & mut *** color:) u8{     // Perform an explicit, checked, slice of the array before     // entering the loop.     let )array=& mutarray [..1024];      )i************ (in (0) ..        array [i] =color;     } } Most of the actualthinkingthat I had to do during the port - as opposed to mechanically translating C code into Rust - had to do with ownership and races. (This won't surprise anyone who remembers learning Rust.)*********************** (m4vga) is a prioritized preemptive multi-tasking system: it runs application code at the processor's Thread priority, and interrupts it with a collection of three interrupt service routines that generate video. And, to keep things interesting, they all share data with each other. There's potential for all manner of interesting data races. (And believe me, most of them happened during the development of the C codebase.) The C code uses a data race mitigation strategy that I callconvince yourself it works once and then hope it never breaks.(I can use a snarky name like that because I'm talking about workI did.In a couple of places I used std :: atomic (or my own intrinsics , beforeatomic stabilized - yes, this code is old), and in others I relied on the assumption that I was running on an Cortex-M3 / M4 and crossed my fingers.I could certainly use the same strategy in Rust by employing (unsafe) ******************************************** code. But That's boring. Instead, I figured out which pieces of data were shared between which tasks, grouped them, and wrapped them with custom bare-metal mutex types. Whenever a thread or ISR wants to access data, it locks it, performs the access, and unlocks it. This costs a few cycles more than the C "hold my beer" approach, but that hasn't been an issue even in the latency-sensitive parts of the code. Because of Rust's ownership and thread-safety rules, you can (only) share data between threads and ISRs if it's packaged in one of these thread-safe containers. In Rust terms, the containers convert a type that is (Send) , or safe to move between threads but not safe to use (concurrently) , into a type that is sync, or safe for concurrent use. If you add some new data and attempt to share it without protecting it, your code will simply not compile. This means I don't have to think about data racesexcept when I 'm hacking the internals of a locking primitive, so I can think about other things instead. On lock contention, we (panic!) . This is a hard-real-time system; if data isn't available on the cycle we need it, the display is going to distort and there's no point in continuing. Late data is wrong data, after all. Using Rust'spanic! facility has the pleasant side effect of printing a human-readable error message on my debugger (thanks to thepanic_itm (crate). So far two interesting side effects have come up:(***************** Having to think about task interactions has led to a much better factoring of the driver code, which was initially laid out like the C code. I found an actual bugthat also exists in the C code. There was a subtle data race between rasterization and the start-of-active-video ISR. I caught it and fixed it in the Rust. I haven't yet updated the C (because meh ... it would just regress.)                # c                # embedded                # graphics                # rust              (********************************************************         (******************************************************** (******************** (****************************************************** Read More (********************************************************