in ,

Small C Compilers, Hacker News


Welcome to bootstrapping![edit]

This wiki is about bootstrapping. Building up compilers and interpreters and tools from nothing.

“Recipe for yogurt: Add yogurt to milk.” – Anon.

short sci fi storyCoding Machines [edit] byLawrence Kesteloot, January 2009

(Also see) http://bootstrappable.org, which has pointers to a mailing list and IRC channel.

Simple explanation: bootstrapping is about building a compiler using tools smaller than itself, as opposed to building a compiler using an already built version of itself. The problem with the second is: Where did that prebuilt binary come from?

Current Topics[edit]

  • mesby janneke,mes
  • stage0by Jeremiah Oriansstage0
  • Coquillageby bms _
  • Descent principle
  • The Semantics Assignment Problem
  • Self-Extension
  • Self-Hosting
  • Build Systems
  • Build Inputs
  • C compilers
  • Below C Level
  • Boostrapping Specific Languages ​​
  • discarded options and why
  • Investigate
  • *************************************************************** (Projects List
  • **************************************************************** (Documents) *************************************************)
  • Forth

(Past Research) *************************** [edit][edit]

Bcompilerby Grimley Evans
This is a detailed log of the process of bootstrapping a series of languages ​​up starting from just a hex assembler written using a hex editor.
The Cuneiform Tablets of 2015by Long Tien Nguyen, Alan Kay
This discusses methods of long term software preservation. Briefly about hardware that will not degrade over time, but the majority of the paper is about how to design a software stack that can be executed in the far future. In order to achieve this they recommend build everything in terms of a machine with a short simple specification.
jonesforth .Sby Richard WM Jones
In depth literate programming describing a complete implementation of forth. Bootstrapped from intel 32 bit assembly with lots of assembler macros into a fully self extensible forth. This is a really illuminating read, teaching a lot of details about forth as well as showing just how minimal a runtime it is possible to make a programming language with.
stoneknifeforthby Kragen
Kragen (again) doing amazing bootstrapping / self hosting work. This forth is implemented in a screenful of code, able to emit ELF files directly. Self extensible. Single char word names.
amberby nineties
These slides outline the developement of rowl and amber. This is a programming language bootstrapped up from assembly. rowl is implemented directly in assembly then parts of the amber vm and compiler are implemented in rowl, then the rest of amber is implemented by self hosting.
SCM-Goby pkelcjte
This project builds a SICP-style, Scheme interpreter with a REPL in Go. The blog post describes each phase. They’re simple-looking. The Github integrates it into a total of 240 lines of code. Being a simple language, the Go implementation could be ported to anything else in our collection or straight hand-assemblied. Then, more complex stuff built on it like nineties or other LISPers do.
jrp.cby curtism
A very small x 86 JIT stack calculator implemented in C. All of the instructions are coded in a clever way to make them each a double word or a quad word.
QCC
The QCC project: hooking tcc frontend up with qcc’s code generator and creating a toybox style set of cc, as, ld tools.
List of Diverse Hardware
A big concern in dealing with trust in hardware is whether it’s subverted or not. Intel, AMD, and many other big names have backdoors in their chips for management purposes. Among other things …;) One cheat to get trustworthy image is to just use a computer you have no reason to believe is subverted. Acquire it under a boring buyer, it itself is a boring tech, do your bootstrapping thing in it air gapped, and use what it produces. It will likely * not * be subverted * by default * since the interdictors and TAO folks have limited resources w / no reason to target the system. Use several that are different for best results. To help with that, I (Nick P.) put together a list of all kinds of CPU’s and execution strategies on Schneier’s blog. Something I left off the list are old TI – 82 calculators, Palm Pilots, etc. Lots of old stuff lying around you can get in person with cash that is probably unsubverted.
golang talkgolang transpiled from c to go
“It’s time for the Go compilers to be written in Go, not in C. I’ll talk about the unusual process the Go team has adopted to make that happen: mechanical conversion of the existing C compilers into idiomatic Go code “. They wrote the compiler in C then translated the source code from C into Go almost automatically (had to do some manual fixing up). This is an interesting approach. Let’s name it the transpile approach to self hosting.
asmutilsa linux distro / userland implemented in assembly
This is a linux distribution implemented entirely in assembly. It doesn’t depend on libc or anything.
COMFY – 65a macro assembler hosted on lisp
Henry G. Baker implements COMFY – 65, a macro assembler hosted on lisp.

With the power of the entire Lisp language available for use within COMFY – 65 macros, the amount of intelligence one can embed in these macros is limitless

BYO assembler in forthby Brad Rodriguez
This is a teaching document that explains how to make an assembler in forth! It shows a very forth-idiomatic style of programming, and how easy it is to make an advanced assembler once you have a working forth.
mrustcby thepowersgang
This is a rust compiler written in C , it translates rust to C. it makes the normal self hosted rustc compiler bootstrappable! It neglects the borrow checker but is still able to compile valid input source correctly.
bsdcby Leor Zolman
AC compiler (for CP / M) implemented in assembly. 25 k lines of asm.
(maru) by Ian Piumarta
This is the real deal. Ian Piumarta implemented a fully bootstrappable scheme here starting from C, then self hosting to a compiler that emits binary directly. Very impressive!
CakeMLby Myreen et al
CakeML is really really fascinating. They have created a theory of SML programs inside HOL, allowing them to prove properties of SML programs embedded inside HOL. They have created a (serious) compiler from SML down to assembly and proved that it preserves semantics all the way. They are then able to compile the compile simultaneously bootstrapping the proof to create a verified compiler binary for which it is proven that it compiles input programs and preserves their semantics. To my knowledge this is the first such development.
bootstrapby Richard Smith
This is an incredibly well developed bootstrapping project. hex assembler. elf maker. x 86 Assembler. linker. B compiler. C compiler. Includes implementations of various POSIX style libc functions along the way. It is extremely well written and worth studying!
ASMCby Giovanni Mascellani
The asmc project is a small bootable kernel that loads up a payload which. payloads exist for assembly compilers and “G language” compilers. The G language is a low level lang below C which was invented to ease bootstrapping. An assembler (which can build the kernel) has been implemented in G.
BLCby Pim Goossens
cmeta – Using ideas from META compiler compiler Pim builds the meta language up from raw hex. blc – binary lambda calculus implementation, capable of computing matt mights factorial program. built using the cmeta system. Incredibly terse. Surprising that the techniques of metacompiler compilers can be applied at such a low level. The amount of leverage may be highest in this project.
(pascal-p) by Pascal-P Porting Kit
“It compiles and runs a subset of the Revised Pascal language. That subset was designed to be the minimum language required to self compile for a new machine implementation. It was part of a” bootstrapping “kit designed to facilitate porting Pascal to new machines. “. The pascal language was implemented with bootstrapping intention in mind. They have a simple “p code” bytecode language that eases the process.
eulexby David Vázquez Púa
This is a forth operating system with emacs like editor and lisp interpreter built in it. It’s a 1700 line assembly script for the bootable forth compiler / interpreter and then the whole rest of the system is implemented in forth. I have not tried but apparently it can build itself with the assembler. This is very impressive work.

Past Research / intray[edit]

important: try to summarize lessons learned from each.

  • Pascal-Sby Wirth (Small, self-contained subset w / great error reporting)
  • Compiler Constructionby Wirth (Oberon-0 language in book is well-suited to bootstrapping)
  • Edisonby Hansen (Language w / 5 statements & small OS on PDP – 11)
  • Project Oberonby Wirth et al (Simple language , compiler, OS, and RISC CPU w / source laid out like a book.)
  • ************************************************************************************************************ (ML / I and Sal) ******************************* (by Tannenbaum) Macro system bootstrapping low-level language, Sal, they built an OS with)
  • COLA whitepaperby Ian Piumarta
  • PreSchemeusing an low level s -exp IL to implement scheme.
  • Incremental, Scheme Compilerby Ghuloum (Build Scheme-to-ASM compiler in “24, small steps; “Githubs available)
  • Red Languageby Rakocevic et al (LISP-like power / DSL’s, can do low-level, batteries included, 1MB standalone)
  • MinCamlby IPA (Efficient compiler for minimal, functional language in 2000 lines & 14 – week segments)
  • Spryby Krampe (Combines traits of LISP, Rebol, Smalltalk, and Forth; hosted on Nim; 2300 loc)
  • LCCby Hanson and Fraser (A 20 Kloc compiler w / book describing its workings; literate code; non-FOSS , but free non-commercial)
  • Axiomatic Bootstrapping: A Guide for Compiler Hackersby Andrew Appel (bootstrapping SML)
  • Merlin: Just Add Reflection(bootstrapping object oriented merlin)
  • ********************************************************************************************************************** (booting BCPL) (bootstrapping BCPL using intcode)
  • High-level Assemblyby Hyde (Assembly w / high-level data types, control flow & a stdlib; use / check just what you need)
  • ************************************************************************************************************************ (Linoleum) by Ghignola (Cross-platform, lean, fast, assembly-like language)
  • wingologabout the guile compiler (all brilliant posts!)
  • ************************************************************************************************************************** (Partcl) by Zaitsev (Tiny TCL; TCL’s parse & interpret easily; also references Picol etc)
  • [1]neatld linker by ali grudi (and alsoneatasNeatcc)
  • SchemeRepoby Univ. of Indiana (Pile of source for Scheme lexers, parsers, comilers, etc.)
  • https://www.youtube.com/watch?v=Sk9TatW9inoTutorial: Building the Simplest Possible Linux System – Rob Landley
  • ******************************************************************************************************************************** (Om Language) by sparist (Prefix, typeless language with three operators; concatenative like Forth)
  • [2]by Laurence Tratt
  • SBCL: a Sanely-Bootstrappable Common Lispby Christophe Rhodes
  • prescheme to c compiler –https://github.com/nineties-retro/sps
  • ************************************************************************************************************************************ (Ur-Scheme) by Kragen Sitaker
  • Qhasmby Daniel Bernstein (portable form of Assembly language that standardizes machine instruction syntax across CPUs)
  • Debian Rebootstrapa project with the idea that bootstrapping debian should be a repeatable process, not a hacky one off thing
  • http://t3x.org/t3x/- minimal procedural language with self hosted tiny compiler
  • **************************************************************************************************************************************** [3]- bootstrapping a linux system from source
  • bootstrapping trust in compilersblog post by Owl’s portfolio
  • programming thought experimentkragen comment on reddit
  • scheme from scratch
  • http://interim-os.com /
  • https://github.com/ m4tx / uefi-jitfuckUEFI JIT brainfuck
  • https: //miyuki.github .io / 2017 / 10 / 04 / gcc-archeology-1.htmlGCC archeology
  • https: // github. com / murisi / L2
  • https: // tinygo .org / faq / why-a-new-compiler /
  • https: // github.com/siraben/meta-II

Groups[edit]

(Karger-Thompson Attack)[edit]

Anything related to the karger thompson attack: proof of concept demos, mitigations, theory.

  • (Multics) the original paper explaining the attack (before thompson!)
  • SCM Securityby Wheeler (Secure distribution & compilation of source fundamentals; Karger advised mastering it)
  • rottenby rntz (thompson attack demo)
  • rust infectionby manishearth (thompson attack demo in the rust compiler)
  • TCC ACSACby daved wheeler
  • CompCertby Leroy et al (Mathematically-verified, C compiler whose specs and proofs checked with tiny, verified checker)
  • *********************************************************************************************** (CakeML) by Myreen et al (Mathematically-verified, SML compiler whose specs and proofs checked with different, tiny , verified checker)
  • ************************************************************************************************************** (VLISP) ******************************* (by Oliva and Wand) Article has links to VLISP which mathematically verified PreScheme and Scheme 48)
  • KCCby Rosu et al (Executable, formal semantics for C in rewrite logic; could do that w / simpler engine)
  • TALCby Cornell (Typed, assembly language to verify safety w / out compiler; checker can be simple; C subset verified compiler to TALC)
  • CoqASMby Microsoft Research (Bootstrap in verifiably -safe assembly in prover checked by tiny, verified checker)

(Ubiquitous Implementations)[edit]

These are tools written in ubiquitous languages, therefore they can be used in a wide variety of contexts.

  • shasmby Hohensee (x) assembler written in BASH)
  • ****************************************************************************************************************************************************************** (AWKLisp) ******************************* (by Bacon) LISP written in Awk; includes Perl version from Perl Avenger)
  • Gherkinby Dipert (LISP written in Bash)
  • BASH Infinityby Brzoska (BASH framework / routines that might help write compilers in it)
  • Mal”make a lisp” implementing a very basic lisp interpreter in hundreds of languages ​​
  • [4]A new bootstrapping project that is built up to a self host language above assembly from a minimal DOS platform.

(Small C Compilers)[edit]

  • (C4) by rswier (incredibly short c compiler)
  • ************************************************************************************************************************************************************************** (cc) by edmund grimley-evans (tiny c compiler)
  • *************************************************************************************************************************************************************************** (CUCU) by Zaitsev (Small, C compiler designed for easy understanding)
  • **************************************************************************************************************************************************************************** (SmallerC) by Frunze (Small, single-pass, C compiler for several ISA’s)
  • Picocinterpreter.
  • ****************************************************************************************************************************************************************************** C Interpreter by Dr Dobbs (Describes building a C interpreter with source)
  • [5]Small C for I 386 (IA – 32)
  • ******************************************************************************************************************************************************************************** (Selfie) , a tiny self-compiling compiler for a subset of C, a tiny self-executing MIPS emulator, and a tiny self-hosting MIPS hypervisor, all in a single 7kLoC file.HN discussion.(Paper.)
  • Tiny C expression compiler Written in Forthbased ontinyc.cby marc feeley.
  • [6][7]C compilers by Rui Ueyamablog
  • [8]10 hour self hosting c compiler

Grammars, Parsing, and Term Rewriting[edit]

  • Grammar Executing Machineby McKeeman and He (Incrementally extend languages ​​from simple to complex grammars in interpreter (s))
  • PEGby kragen (parsing)
  • ********************************************************************************************************************************************************************************************** (PEG-based simple compiler) by Ian Piumarta
  • *********************************************************************************************************************************************************************************************** META IIby Bayfront Tech (Original meta-compiler w / live code and detailed tutorial; OMeta was successor)
  • ************************************************************************************************************************************************************************************************ (META II implementation) by Lugon (Looks like a small implementation of META II; also bootstrapped in META II)
  • ************************************************************************************************************************************************************************************************* (OMeta # Intro) by Moser (OMeta intro that nicely illustrates the meta approach / advantages)

Virtual Machines, Instruction Sets[edit]

  • (P-code) by Wirth (High-level language & libraries target ultra-simple, portable interpreter)
  • sweet 16by Steve Wozniak
  • Tiny BASICby Allison (Small BASIC whose original VM took 120, virtual opcodes to implement using 3KB RAM)
  • ******************************************************************************************************************************************************************************************************** (Klipby Cutting (Compiler & runtime for simple language for students; done in C # ; runtime is very readable)

CPU’s fo r Bootstrapping: The Simple, The Verified, and The Necessarily Complex[edit]

  • NAND2Tetrisby Nisan and Schocken (Guide that teaches hardware step-by-step in fun way with simple CPU emerging)
  • J1by by Bowman (16 – bit Forth CPU in 200 lines of Verilog that does 100 MIPS on FPGA’s )
  • ************************************************************************************************************************************************************************************************************** (H2) by Howe (Modified, VHDL version of J1 with detailed description and Howe’s code MIT-licensed)
  • RISC-0by Wirth (Simple, RISC CPU & SOC designed for Oberon language with detailed docs and source online)
  • JOPby Shoeberl et al (Embedded Java processor that takes up (slices on FPGA)
  • Scheme Machineby Burger (Scheme interpreter implemented as CPU using formal methods)
  • ZPUby Zylin AS (Tiny, 32 – bit CPU for deep embedded apps in (LUT’s)
  • ****************************************************************************************************************************************************************************************************************** (J2) ******************************* (by Landley et al) Clone of cost-efficient, SuperH-2 CPU in open-source)
  • VAMPby Beyer et al (Formally-verified, DLX-style processor in 18, 000 slices on Xilinx
  • ******************************************************************************************************************************************************************************************************************** (Leon3) ******************************* (by Gaisler) Industry-grade, 32 – bit SPARC w / auto-configuration of core and GPL license)
  • Rocketby Univ of CA (1.4GHz RISC-V CPU and generator for customization)
  • OpenPITONby Princeton 25 – core, shared-memory, SPARC CPU open-sourced and very scalable)

Minimal Operating Systems[edit]

  • KolibriOS– lightweight assembly OS.
  • MikeOS- same.
  • *************************************************************************************************************************************************************************************************************************** (Sortix) – modern reimplementation of POSIX in C. (Note: No perl port and GCC does not build natively on it. (yet.))
  • **************************************************************************************************************************************************************************************************************************** (ASMLINUX) – linux kernel, but the userspace is implemented entirely in assembly.
  • ***************************************************************************************************************************************************************************************************************************** (LFS) – Guide on building Linux and the GNU userspace.
  • buildroot
  • NetBSD build.sh- Cross-build a complete NetBSD ISO from a foreign OS. There’s also aguidein the official NetBSD docs.
  • lh-bootstrap- alternative linux distro, using musl instead of glibc.
  • ******************************************************************************************************************************************************************************************************************************** (xv6) – UNIX teaching OS MIT
  • ********************************************************************************************************************************************************************************************************************************* (OS /) – UNIX teaching OS Harvard
  • https://landley.net/toybox/about.html- Toybox, alternative to Busybox by Robert Landley, see alsoAboriginal Linuxandmkrootby the same author, which are all ge ared toward a minimal boostrappable system
  • https://github.com/pikhq/bootstrap-linux- Another take at a bootstrappable Linux system

Biology / Other?[edit]

Helpful Links[edit]

  • AIM – 039 .pdfThe first self hosted lisp
  • lambda-the-ultimatethread asking for info on bootstrapping
  • awesome-compilersgithub list with a lot of information (copy the relevant parts to this wiki)
  • Tombstone diagram
  • Bootstrappablea community hub for bootstrapping, with mailing list.
  • bootstrappable mailing list
  • ************************************************************************************************************************************************************************************************************************************************** (yabfc) – Generating-executable-files-from-scratch
  • *************************************************************************************************************************************************************************************************************************************************** (ELF visualization)
  • **************************************************************************************************************************************************************************************************************************************************** (Cfront) – converts C to C; developed by Bjarne Stroustrup.
  • https: // sourceware.org/glibc/wiki/FAQ#How_do_I_install_all_of_the_GNU_C_Library_project_libraries_that_I_just_built.3F
  • Formal Compiler Verification with ACL2- proving a compiler correct with ACL2 and discussion about correctness and self compiling.

Brave Browser
Read More
Payeer

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Pair Locking your iPhone with Configurator 2, Hacker News

Pair Locking your iPhone with Configurator 2, Hacker News

“Jesus Shoe”: $ 3,000 Sneakers Filled With Holy Water Sell Out In Minutes – NDTV News, Ndtv.com

“Jesus Shoe”: $ 3,000 Sneakers Filled With Holy Water Sell Out In Minutes – NDTV News, Ndtv.com