mkirchner / gc, Hacker News

mkirchner / gc, Hacker News



gcis an implementation of a conservative, thread-local, mark-and-sweep garbage collector. The implementation provides a fully functional replacement for the standard POSIXmalloc (),calloc (),realloc (), andfree ()calls.

The focus ofgcis to provide a conceptually clean implementation of a mark-and-sweep GC, without delving into the depths of architecture-specific optimization (see e.g. theBoehm GCfor such an undertaking). It should be particularly suitable for learning purposes and is open for all kinds of optimization (PRs welcome!).

The original motivation forgcis my desire to writemy own LISPin C, entirely from scratch – and that required garbage collection.


This work would not have been possible without the ability to read the work of others, most notably theBoehm GC, orangeduck’stgc(which also follows the ideals of being tiny and simple), andthe garbage collection Handbook.

**************Table of contents

Table of contents

  • (Documentation Overview) ************Quickstart
  • Download and test

  • (Basic usage) Core APIStarting, stopping, pausing, resuming and running GC
  • (Memory allocation and deallocation) Helper functionsBasic Concepts (Data Structures)

    Garbage collection

  • ReachabilityThe Mark-and-Sweep Algorithm*************************** (Finding roots**************** Depth-first recursive markingDumping registers on the stack

  • **************************** Documentation Overview ******************** (Read thequickstartbelow to see how to get started quickly
  • The conceptssection describes the basic concepts and design decisions that went into the implementation ofgc.
  • Interleaved with the concepts, there are implementation sections that detail the implementation of the core components, see hash map implementation
  • ,dumping regsiters on the stack,finding roots, and depth-first, recursive marking.Quickstart

    Basic usage
    **************************** (gc.h) **************************************************************************************
    ... (void) ****************************************************** some_fun () {     ...      (int) * my_array=(gc_calloc) **************************************************** (gc,********************************** (**************************************************, (sizeof) ************************************************** (int) ));      (for (**************** (size_t) **************************************************** (i; i********************************Payeer
    ;    i) {         my_array [i]=45
    ;     }     ...     

    // look ma, no free! } (int) ***************************************************** (main) **************************************************** (

    *************************** (int) ************************************************** (argc, (char

    * argv []) {     gc=(gc_start) (gc, & argc);     ...     

    ();     ...      (gc_stop) (gc);      (return) ****************************************************** (0) **************************************************; }

    **************************************************Core API

    This describes the core API, seegc.hfor more details and the low-level API.

    Starting, stopping, pausing, resuming and running GC

    In order to initialize and start garbage collection, use thegc_start ()function and pass a bottom-of-stack address:

    **************************************** (void) * bos);

    root finding(scanning) starts.

    Garbage collection can be stopped, paused and resumed with

    Memory allocation and deallocation

    gcsupportsmalloc (),calloc ()andrealloc ()- style memory allocation. The respective funtion signatures mimick the POSIX functions (with the exception that we need to pass the garbage collector along as the first argument):

    **************************************** (size_t) ***************************************************** size); (void) * (gc_calloc) **************************************************** (GarbageCollector * gc,******************************** (size_t) ***************************************************** (count, size_tsize); (void) * (gc_realloc) **************************************************** (GarbageCollector * gc,******************************** (void) ****************************************************** ptr, size_t (size); (**************************************

    It is possible to pass a pointer to a desctructor function through the extended interface:

    ************ (sizeof) **************************************************** (SomeObject), dtor); ...

    gcsupports static allocations that are garbage collected only when the GC shuts down viagc_stop (). Just use the appropriate helper function:

    **************************************** (size_t) ***************************************************** size, (void) ****************************************************** dtor) ( (void) *));

    Static allocation expects a pointer to a finalization function; just set toNULLif finalization is not required.

    Note thatgccurrently does not guarantee a specific ordering when it collects static variables, If static vars need to be deallocated in a particular order, the user should callgc_free ()on them in the desired sequence prior to callinggc_stop (), see below.

    It is also possible to trigger explicit memory deallocation using

    **************************************** (void) * ptr);

    Static variablesHelper functions

    gcalso offers astrdup ()implementation that returns a garbage-collected copy:

    **************************************** (const) ***************************************************** (char) ******************************************************* s);

    **********************************************************Data Structures

    The core data structure insidegcis a hash map that maps the address of allocated memory to the garbage collection metadata of that memory:

    The items in the hash map are allocations, modeles with theAllocationstruct:

    Garbage collection

    gctriggers collection under two circumstances: (a) when any of the calls to the system allocation fail (in the hope to deallocate sufficient memory to fulfill the current request); and (b) when the number of entries in the hash map passes a dynamically adjusted high water mark.

    If either of these cases occurs,gcstops the world and starts a mark-and-sweep garbage collection run over all current allocations. This functionality is implemented in thegc_run ()function which is part of the public API and delegates all work to thegc_mark ()andgc_sweep ()functions that are part of the private API.

    gc_mark ()has the task of finding rootsand tagging all known allocations that are referenced from a root (or from an allocation that is referenced from a root, i.e. transitively) as "used". Once the marking of is completed,gc_sweep ()iterates over all known allocations and deallocates all unused (i.e. unmarked) allocations, returns togc_run ()and the world continues to run.


    gcwill keep memory allocations that are (reachable) and collect everything else. An allocation is considered reachable if any of the following is true:

    (There is a a pointer on the stack that points to the allocation content. The pointer must reside in a stack frame that is at least as deep in the call stack as the bottom-of-stack variable passed togc_start ()(iebosis the smallest stack address considered during the mark phase).
  • There is a pointer insidegc_ * alloc ()- allocated content that points to the allocation content.
  • The allocation is tagged withGC_TAG_ROOT.
  • ********************* Finding roots

    At the beginning of the (mark) stage, we first sweep across all known allocations and find explicit roots with theGC_TAG_ROOTtag set. Each of these roots is a starting point fordepth-first recursive marking.

    gcsubsequently detects all roots in the stack (starting from the bottom-of-stack pointerbosthat is passed togc_start ()) and the registers (bydumping them on the stackprior to the mark phase) and Uses these as starting points for marking as well.

    Depth-first recursive marking

    Given a root allocation, marking consists of (1) setting thetagfield in anAllocationobject toGC_TAG_MARKand (2) scanning the allocated memory for pointers to known allocations, recursively repeating the process.

    The underlying implementation is a simple, recursive depth-first search that scans over all memory content to find potential references:

    **************************************** (void) * ptr) {     Allocation * alloc=(gc_allocation_map_get) ****************************************************** (gc ->************************ (allocs) ******************************************************, ptr);      (if) **************************************************** (alloc &&! (alloc ->************************ (tag) **************************************************** & GC_TAG_MARK)) {         alloc ->(tag) |=GC_TAG_MARK;          (for (*************** (char) ***************************************************** p=( (char******************** (void) **) p);         }     } }Dumping registers on the stack

    In order to make the CPU register contents available for root finding,gcdumps them on the stack. This is implemented in a somewhat portable way usingsetjmp (), which stores them in ajmp_bufvariable right before we mark the stack:

     (void) ******************************************************  (************************************************** (volatile) ***************************************************** _mark_stack) (GarbageCollector=gc_mark_stack;  (jmp_buf)  ctx;  (memset) ****************************************************** (& ctx,***************** (0) ,  (sizeof) *****************************************************  (jmp_buf) ));  (setjmp)  (ctx);

    The detour using thevolatilefunction pointer_ mark_stackto thegc_mark_stack ()function is necessary to avoid the inlining of the call togc_mark_stack ().

    (****************************************************************************** (Read More) ************