**************************************************************************************** – –
Over the past few months, I’ve been rewriting it – in (Rust) ************.
This is an interesting test case for Rust, because we’re very much in C / C ‘s home court here: the demo runs on the bare metal, without an operating system, and is very sensitive to both CPU timing and memory usage.
The results so far?The Rust implementation is simpler, shorter (in lines of code), faster, and smaller (in bytes of Flash)than my heavily-optimized C version – and because it’s almost entirely safe code,several types of bugs that I fought regularly, such as race conditions and dangling pointers, are now caughtby the compiler.
It’s fantastic. Read on for my notes on the process.
The Rust tools and library ecosystem are fantastic. Simply having a package manager is an incredibly important advance.
When I am writing C , I’m thinking about undefined behavior and bugs the compiler won’t catch. When I’m writing Rust, I’m thinking instead about how to optimize things or add features. There is a very realcognitive load difference and it makes me more productive.
Rust’s safety features, such as bounds checking, have caught bugs and have not caused performance problems. (With one exception, discussed below; in that case the fix was simple.)
The port revealedsignificant subtle bugs in the C codewhen the Rust compiler wouldn’t let me do certain things … that turned out to be legitimately incorrect things to do.I wrotem4vgalib
Now, given myfeelings about C
My graphics demos are so resource-constrained, and so timing-sensitive, that they fall squarely into the traditional domain of assembly and C – a domain that has been well-defended for years. Can I build the same thing using a memory-safe language? Could I use the additional brain-space that I’mnotspending on remembering C ‘s initialization order rules (for example) to make a better system with more features?
The answer so far seems to be yes.
This is a late addition to my notes, because I’ve gotten so used to it in languages like Rust, Haskell, Python … even JavaScript … that I had forgotten what a giant thing this is.
Programming languages fall into two categories: those that were designed before the advent of modern package managers, and those designed after. There’s a very important difference between these two categories:
Pre-package-manager languages try to have an everything-plus-the-kitchen-sink standard library, and developers tend to avoid third-party libraries.
Post-package-manager languages go for a more minimal standard library, and Developers are accustomed to extending it with packages.
For bare-metal programming specifically, thetruly killer feature of Rust is the the___dd (ecosystem.)
C has a monolithic standard library with an amazing set of cool stuff in it (because, as I noted in the last section, of when it was written). However, the library embeds some important assumptions. In particular, it is written for a “normal” C execution environment, which for our purposes means two things:(***************** (There is a heap, and it’s okay to allocate / free whenever.
In most high-reliability, hard-real-time embedded environments, neither of these statements is true. We eschew heaps because of the potential for exhaustion and fragmentation; we eschew exceptions because the performance of unwinding code is unpredictable and vendor-dependent.Now, there arepartsof the C standard library that you can use safely in a no-heap, no-exceptions environment. Header-only libraries like type_traitsare probably fine. Simple primitive types likeatomic are … probably fine?
I keep saying “probably” because the no-heap, no-exception subset of the C standard isis not clearly defined. (The C standards folk have, in fact, resisted doing this, arguing that it would fragment the language; this ship has most definitely sailed.) As a result, it’s really easy toaccidentallyintroduce a heap dependency, or toaccidentallyuse an API that can’t indicate failure when exceptions are disabled (likestd :: vector :: push_back
).
The Rust standard library has a critical difference: it’s divided into two parts, std and (core [i] . std is like the C equivalent. core
core
You can trustothercrates to do the same, so you can use third-party libraries safely if they, too, areno_std. Many crates are either no_std by default, or can have it enabled at build time.coreis small enough that porting it to a new platform is easy - significantly easier, in fact, than portingnewlib, the standard-bearer for Portable embedded C libraries.(For) m4vgalib
I rewrote almost all my dependencies to get a system that wouldn't throw or allocate. In Rust, I don't have to do that!Rust's ownership rules produce a sort of bizarro-world of API design.
Some (uncommon, but reasonable) API designs won't make it past the borrowing checker. (In nearly every case, these are APIs that wereeasy to use incorrectlyin other languages.
Some API patterns that are grossly unsafe or unwise in other languages are routine in Rust because of lifetime checking.
As an example of the latter: it is common, and safe, to loan out stack-allocated data structuresto other threads with no runtime checks. (See: scoped threads in crossbeam
Another: it is normal in Rust text-processing code to deal in& str, which is equivalent to a C string_view (******************************************. Storing astring_view in C (say, in the heap) is an incredibly bad idea, becauseIt’s easy for it to become a dangling pointer; C programs resort to defensive copying to avoid this. On the other hand, Rust programs routinely store& str, copying only when the borrow checker can't prove that the code is correct.When this is working well, it can cause abstractions and complexity to dissolve.
Concrete example: (m4vgalib) (C ) lets applications provide custom rasterizers
You, the application author, have some responsibilities to use this API safely:(*****************
The (Rasterizer) object needs to hang around until you're done with it - it might be static or it might be allocated from a carefully-managed arena. Otherwise, the ISR will try to use dangling pointers, and that's bad.
While the (Rasterizer) object is accessible by the ISR, it can be entered at basically any time by code running at interrupt priority. Because we can’t disable interrupts without distorting the display, this means that your application code that shares state with theRasterizer( say, a drawing loop) needs to be written carefully to avoid data races. Commonly, this means double-buffering with a std :: atomic
flip signal ... and some manually-inserted barriers ... and some squinting and care to avoid accessing other state incorrectly.
Before disposing of the (Rasterizer) object, you must un-register it with the driver. This prevents an ISR from dereferencing its dangling pointer, which, again, would be bad.
I recreated the C API verbatim in Rust, and immediately started to run into ownership issues. My internal monologue went something like this:
“Okay, here’s a (Raster) trait and an implementation thereof. ”
“Hm. How can I pass a reference to this to an interrupt handler? In C I stuffed a pointer into a global variable, but Rust’s the rules around (static) state seem to prevent that. ““Okay, I’ve built an abstraction (****************** IRef ) to enable that to be done safely; only it turns out I didn’t actually want to giveof the (Rasterizer) ******************************************* to the ISR, because I want to draw into its background buffer and make other state changes. It needs to be shared with the main rendering loop! ““If I split the (Rasterizer) ********************************************* into two parts and giveoneto the ISR, how do I communicate between them when it comes time to flip buffers? Do I need to pepper my code with Cell to do interior mutability ? ““This feels a lot like the problem that scoped threads
letred_line=SpinLock :: new (Wrapping (
)0)); vga.with_raster(
// The raster callback is invoked on every horizontal retrace to // provide new pixels. It runs in interrupt context. line, tgt, ctx
This makes the problem of sharing state trivial: have the state in scope when you declare these closures, and share it using normal Rust techniques.
In addition to being easier to use, this API is also much harder to (misuse) : It’s essentially impossible to accidentally introduce a data race. This is because the raster callback is required to beSend, meaning it can safely be transferred across threads (or, here, to an interrupt handler, which is like a second thread). If the closure had captured some state that isnt thread-safe, like a simple (mut) ****************************************** (local variable or a) **************************************** (Cell) , it is a compile error. ( SpinLockin the code above is thread- safe.)As of C 20, C has closures with captures. You could almost implement this same API in m4vgalib. But I wouldn't, because ...************** It wouldn't be robust.Capturing stack structures by reference creates a real risk that you'll accidentally leak the reference into a larger scope, e.g. by storing it in a global or member field of a long-lived object. Plus, C 's type system does not have any notion of thread-safety, so nothing would stop you from sharing a non-threadsafe structure with the ISR. It's allfootguns
************** It might require allocations.In Rust, the ISR invokes the closure generically through the (FnMut) ************************************ trait that all closures implement . In C , there is no direct equivalent; closures do not have vtables, but must be wrapped in a heap-allocated (std :: function) to be used dynamically. (In Rust, closures also do not have vtables, because we don't do virtual dispatch the same way. That's a longer story.)
Rust has a reputation for producing larger binaries than C . This reputation appears to be undeserved.If you run a release build of one of the demos and run (size) ******************************************, you will find binaries that are larger than their C equivalents. For example, here's a comparison of horiz_tp (written in each language:)
But this means each binary contains all the panic strings, plus all the message formatting code. If you would like to produce smaller binaries, and are willing to sacrifice panic messages, you need to build with a different feature set:
Size has not been a issue for this project.I'm currently using (unsafe) ******************************************** (in) places. None of them are for Rust-specific performance reasons.(I say "Rust-specific" because some of them are calling into assembly routines, which definitely exist for performance reasons, but are identical in C .)
After that, the leading causes are situations that areinherently unsafe. In these cases the right solution is to wrap the code in a neat, safe API (and I have):
5 cases: Getting exclusive references to shared mutable global data , which is super racy unless you're careful.
This leaves two (unsafe) Uses that can likely be fixed:
(Taking a very lazy shortcut with
) ************************************ (core :: mem :: transmutethat can probably be improved.Deliberately aliasing a [u32] (as) ******************************************, which is memory-safe but endian-sensitive.m4vgalib, you would be reading **********************************************************************************, ************************************************************************************** lines of unsafe code.That is, every C statement that I wrote.
I can't bring up memory safety without someone taking a potshot at Rust's bounds checking for arrays. Since m4vga demands pretty high performance, I ' ve been auditing the machine code produced by (rustc) In the performance critical parts of the code, bounds checks were eitheralready eliminated at compile time,or could be eliminated by a simple refactoring of the code.The demos spend effectively no time evaluating bounds checks.
There are two relevant patterns in the current code.
First: in Rust, we can pass a fixed-length array by referencewithout it degrading into a pointer as it does in C.For instance ,fnget_element_3(array
// This bounds check is trivially proven and will not be // performed at runtime.array [3] }// This attempt to pass a 2-element array is a compile. // error.get_element_3([0;2]);
Neither of those statements holds in C. As a result, we use fixed-length arrays in several places in the demo where we did in C .
************************ (to the poor compiler), we can hoist bounds checks to a convenient place. For instance, this routine as written performs runtime bounds checks at each loop iteration:
=color; } }
We can check the length outside the loop, and make the length visible to the compiler, like this:
***
color:) u8{ // Perform an explicit, checked, slice of the array before // entering the loop. let
)array=&
mutarray [..1024];
)i************ (in
(0) .. array [i]
=color; } }
Most of the actualthinkingthat I had to do during the port - as opposed to mechanically translating C code into Rust - had to do with ownership and races.
(This won't surprise anyone who remembers learning Rust.)*********************** (m4vga) is a prioritized preemptive multi-tasking system: it runs application code at the processor's Thread priority, and interrupts it with a collection of three interrupt service routines that generate video.
The C code uses a data race mitigation strategy that I callconvince yourself it works once and then hope it never breaks.(I can use a snarky name like that because I'm talking about workI did.In a couple of places I used std :: atomic (or my own intrinsics , beforeatomic stabilized - yes, this code is old), and in others I relied on the assumption that I was running on an Cortex-M3 / M4 and crossed my fingers.I could certainly use the same strategy in Rust by employing (unsafe) ******************************************** code. But That's boring.
Instead, I figured out which pieces of data were shared between which tasks, grouped them, and wrapped them with custom bare-metal mutex types. Whenever a thread or ISR wants to access data, it locks it, performs the access, and unlocks it. This costs a few cycles more than the C "hold my beer" approach, but that hasn't been an issue even in the latency-sensitive parts of the code.
Because of Rust's ownership and thread-safety rules, you can (only) share data between threads and ISRs if it's packaged in one of these thread-safe containers. In Rust terms, the containers convert a type that is (Send) , or safe to move between threads but not safe to use (concurrently) , into a type that is sync, or safe for concurrent use. If you add some new data and attempt to share it without protecting it, your code will simply not compile. This means I don't have to think about data racesexcept when I 'm hacking the internals of a locking primitive, so I can think about other things instead.
On lock contention, we (panic!) . This is a hard-real-time system; if data isn't available on the cycle we need it, the display is going to distort and there's no point in continuing. Late data is wrong data, after all. Using Rust'spanic! facility has the pleasant side effect of printing a human-readable error message on my debugger (thanks to thepanic_itm (crate).
So far two interesting side effects have come up:(*****************
Having to think about task interactions has led to a much better factoring of the driver code, which was initially laid out like the C code.
I found an actual bugthat also exists in the C code. There was a subtle data race between rasterization and the start-of-active-video ISR. I caught it and fixed it in the Rust. I haven't yet updated the C (because meh ... it would just regress.)
GIPHY App Key not set. Please check settings