in ,

The Rust Compilation Model Calamity | TiDB, Hacker News

The Rust Compilation Model Calamity | TiDB, Hacker News

Rust Compile Time Adventures with TiKV

The Rust programming language was designed for slow compilation times.

I mean, that wasn’t the goal . As is often cautioned in debates among their designers, programming language design is full of tradeoffs. One of those fundamental tradeoffs is run-time performance vs. compile-time performance , and the Rust team nearly always (if not always) chose run-time over compile-time.

So Rust compile times are bad. This is kinda infuriating, as almost everything that matters about Rust is pretty damn good. But Rust compile times are so, so bad.

Rust Compile-time Adventures with TiKV: Episode 1

At PingCAP , we develop our distributed storage system, TiKV , in Rust, and it compiles slow enough to discourage many in the company from using Rust. I recently spent some time, along with several others on the TiKV team and its wider community, investigating TiKV’s compile times.

Over a series of posts, I’ll discuss what we have learned:

Why compiling Rust is slow, and / or feels slow;

  • How Rust’s development lead to slow compile-times;
  • Compile-time use cases;
  • Things we measured; things we want to measure but haven’t or don’t know how;
  • Ideas that improved compile times;
  • Ideas that did not improve compile times;
  • How TiKV compile times have changed over time;
  • Suggestions for how to organize Rust projects that compile fast;
  • Recent and future upstream improvements to compile times.
  • In this episode:

  • The specter of poor Rust compile times at PingCAP
  • Preview: the TiKV compile-time adventure so far
  • Rust’s designs for poor compilation time
  • Bootstrapping Rust
  • (Un) virtuous cycles
  • (early decisions that favored run- time over compile-time
  • Recent work on Rust compile times
  • In the next episode
  • Thanks
  • The specter of poor Rust compile times at PingCAP

    At PingCAP , my colleagues use Rust to write TiKV , the storage node of TiDB , our distributed database. They do this because they want this most important node in the system to be fast and reliable by construction, at least to the greatest extent reasonable.

    It was mostly a great decision, and most people internally are mostly happy about it.

    But many complain about how long it takes to build. For some, a full rebuild might take minutes in development mode, and minutes in release mode. To developers of large systems projects, this might not sound so bad, but it’s much slower than what many developers expect out of modern programming environments. TiKV is a relatively large Rust codebase, with 2 million lines of Rust. In comparison, Rust itself contains over 3 million lines of Rust, and Servo (contains 2.7 million (see) full line counts here .

    Other nodes in TiDB are written in Go, which of course comes with a different set of advantages and disadvantages from Rust. Some of the Go developers at PingCAP resent having to wait for the Rust components to build. They are used to a rapid build-test cycle.

    Rust developers, on the other hand, are used to taking a lot of coffee breaks (or tea, cigarettes, sobbing, or whatever as the case may be – Rust developers have the spare time to nurse their demons).

    Preview: The TiKV Compile-time adventure so far

    The first entry in this series is just a story about the history of Rust with respect to compilation time. Since it might take several more entries before we dive into concrete technical details of what we’ve done with TiKV’s compile times, here’s a pretty graph to capture your imagination, without comment.

    Rust Compile Times for TiKV

    Rust’s designs for poor compilation time

    Rust was designed for slow compilation times.

    I mean, that wasn’t the goal . As is often cautioned in debates among their designers, programming language design is full of tradeoffs. One of those fundamental tradeoffs is run-time performance vs. compile-time performance , and the Rust team nearly always (if not always) chose run-time over compile-time.

    The intentional run-time / compile-time tradeoff isn’t the only reason Rust compile times are horrific, but it’s a big one. There are also language designs that are not crucial for run-time performance, but accidentally bad for compile-time performance. The Rust compiler was also implemented in ways that inhibit compile-time performance.

    So there are intrinsic language-design reasons and accidental language-design reasons for Rust’s bad compile times. Those mostly can’t be fixed ever, although they may be mitigated by compiler improvements, design patterns, and language evolution. There are also accidental compiler-architecture reasons for Rust’s bad compile times, which can generally be fixed through enormous engineering effort and time.

    If fast compilation time was not a core Rust design principle, what were Rust’s core design principles? Here are a few:

  • Practicality . – it should be a language that can be and is used in the real world.

    The co-development of Rust and Servo created a (virtuous cycle) that allowed both projects to thrive. Today, Servo components are deeply integrated into Firefox, ensuring that Rust cannot die while Firefox lives.

    Mission accomplished.

    The previously-mentioned early self-hosting was similarly crucial to Rust’s design, making Rust a superior language for building Rust compilers. Likewise, Rust and Rust mission accomplished WebAssembly

  • were developed in close collaboration (the author of (Emscripten) , the author of Cranelift , and I had desks next to each other for years), making WASM an excellent platform for running Rust, and Rust well-suited to target WASM.

    Sadly there was no such reinforcement to drive down Rust compile times. The opposite is probably true – the more Rust became known as a fast language, the more important it was to be the fastest language. And, the more Rust’s developers got used to developing their Rust projects across multiple branches, context switching between builds, the less pressure was felt to address compile times.

    This only really changed once Rust 1.0 was released in and started to receive wider use.

    For years Rust slowly boiled in its own poor compile times, not realizing how bad it had gotten until it was too late. It was 1.0. Those decisions were locked in.

    Too many tired metaphors in this section. Sorry about that.

    Early decisions that favored run-time over compile-time

    If Rust is designed for poor compile time, then what are those designs specifically? I describe a few briefly here. The next episode in this series will go into further depth. Some have greater compile-time impact than others, but I assert that all of them cause more time to be spent in compilation than alternative designs.

    Looking at some of these in retrospect, I am tempted to think that “well, of course Rust must have feature foo “, and it’s true that Rust would be a completely different language without many of these features. However, language designs are tradeoffs and none of these were predestined to be part of Rust.

  • Borrowing – Rust’s defining feature. Its sophisticated pointer analysis spends compile-time to make run-time safe.

  • Monomorphization – Rust translates each generic instantiation into its own machine code, creating code bloat and increasing compile time.

  • Stack unwinding – stack unwinding after unrecoverable exceptions traverses the callstack backwards and runs cleanup code. It requires lots of compile-time book-keeping and code generation.

  • Build scripts – build scripts allow arbitrary code to be run at compile-time, and pull in their own dependencies that need to be compiled. Their unknown side-effects and unknown inputs and outputs limit assumptions tools can make about them, which e.g. limits caching opportunities.

  • Macros – macros require multiple passes to expand, expand to often surprising amounts of hidden code, and impose limitations on partial parsing. Procedural macros have negative impacts similar to build scripts.

  • LLVM backend – LLVM produces good machine code, but runs relatively slowly.

  • Relying too much on the LLVM optimizer – Rust is well-known for generating a large quantity of LLVM IR and letting LLVM optimize it away. This is exacerbated by duplication from monomorphization.

  • Split compiler / package manager – although it is normal for languages ​​to have a package manager separate from the compiler, in Rust at least this results in both cargo and rustc having imperfect and redundant information about the overall compilation pipeline. As more parts of the pipeline are short-circuited for efficiency, more metadata needs to be transferred between instances of the compiler, mostly through the filesystem, which has overhead.

  • Per-compilation-unit code-generation – rustc generates machine code each time it compiles a crate, but it does not need to – with Most Rust projects being statically linked, the machine code isn’t needed until the final link step. There may be efficiencies to be achieved by completely separating analysis and code generation.

  • Single-threaded compiler – ideally, all CPUs are occupied for the entire compilation. This is not close to true with Rust today. And with the original compiler being single-threaded, the language is not as friendly to parallel compilation as it might be. There are efforts going into parallelizing the compiler, but it may never use all your cores.

  • Trait coherence – Rust’s traits have a property called “coherence”, which makes it impossible to define implementations that conflict with each other. Trait coherence imposes restrictions on where code is allowed to live. As such, it is difficult to decompose Rust abstractions into, small, easily-parallelizable compilation units.

  • Tests next to code – Rust encourages tests to reside in the same codebase as the code they are testing. With Rust’s compilation model, this requires compiling and linking that code twice, which is expensive, particularly for large crates.

  • Recent work on Rust compile times

    The situation isn’t hopeless. Not at all. There is always work going on to improve Rust compile times, and there are still many avenues to be explored. I’m hopeful that we’ll continue to see improvements. Here is a selection of the activities I’m aware of from the last year or two. Thanks to everybody who helps with this problem.

  • The Rust compile-time master issue

  • Tracks various work to improve compile times
  • Contains a great overview of factors that affect Rust compilation performance and potential mitigation strategies
  • Pipelined compilation ( (1) , (2) , 3

    Typechecks downstream crates in parallel with upstream codegen. Now on by default on the stable channel

    I apologize to any person or project I did not credit.

    In the next episode

    So Rust dug itself deep into a corner over the years and will probably be digging itself back out until the end of time (or the end of Rust – same thing, really). Can Rust compile-time be saved from Rust’s own run-time success? Will TiKV ever build fast enough to satisfy my managers?

    In the next episode, we’ll deep-dive into the specifics of Rust’s language design that cause it to compile slowly.

    Stay Rusty, friends.

    Thanks

    A number of people helped with this blog series. Thanks especially to Niko Matsakis, Graydon Hoare, and Ted Mielczarek for their insights, and Calvin Weng for proofreading and editing.

    About the author

    Brian Anderson is one of the co-founders of the Rust programming language and its sister project, the Servo web browser. He is now working in PingCAP as a senior database engineer.

    Read More ()

  • What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    Saving Private Groups: This Time the Mission is the Fan, Hacker News

    Saving Private Groups: This Time the Mission is the Fan, Hacker News

    Amazon Prime now has more than 150 million members, Recode

    Amazon Prime now has more than 150 million members, Recode