gc
is an implementation of a conservative, thread-local, mark-and-sweep garbage collector. The implementation provides a fully functional replacement for the standard POSIXmalloc ()
,calloc ()
,realloc ()
, andfree ()
calls.
The focus ofgc
is to provide a conceptually clean implementation of a mark-and-sweep GC, without delving into the depths of architecture-specific optimization (see e.g. theBoehm GCfor such an undertaking). It should be particularly suitable for learning purposes and is open for all kinds of optimization (PRs welcome!).
The original motivation forgc
is my desire to writemy own LISPin C, entirely from scratch – and that required garbage collection.
Acknowledgment This work would not have been possible without the ability to read the work of others, most notably theBoehm GC, orangeduck’stgc(which also follows the ideals of being tiny and simple), andthe garbage collection Handbook. Table of contents
Download and test Garbage collection ReachabilityThe Mark-and-Sweep Algorithm*************************** (Finding roots**************** Depth-first recursive markingDumping registers on the stack
**************Table of contents
gc
.
****************************Download and test$ git clone [email protected]: mkirchner / gc.git $ cd gc $ make test $ make coverage # to open the current coverage in a browser
******************************Basic usage... #include (**************************************************************** (gc.h) **************************************************************************************... (void) ****************************************************** some_fun () { ... (int) * my_array=(gc_calloc) **************************************************** (gc,********************************** (**************************************************, (sizeof) ************************************************** (int) )); (for (**************** (size_t) **************************************************** (i; i********************************
; i) { my_array [i]=45; } ...
// look ma, no free! } (int) ***************************************************** (main) **************************************************** (
*************************** (int) ************************************************** (argc, (char
* argv []) { gc=(gc_start) (gc, & argc); ... some_fun
(); ... (gc_stop) (gc); (return) ****************************************************** (0) **************************************************; }
****************************************************************Core API
This describes the core API, seegc.h
for more details and the low-level API.
**********************************Starting, stopping, pausing, resuming and running GC
In order to initialize and start garbage collection, use thegc_start ()
function and pass a bottom-of-stack address:
(void) ************************************************** gc_start (GarbageCollector * gc,**************************************** (void) * bos);
The bottom-of-stack parameterbos
needs to point to a stack-allocated variable and marks the low end of the stack from whereroot finding(scanning) starts.
Garbage collection can be stopped, paused and resumed with
(void) ************************************************** gc_stop (GarbageCollector * gc); (void) ****************************************************** (gc_pause) **************************************************** (GarbageCollector * gc); (void) ****************************************************** (gc_resume) **************************************************** (GarbageCollector * gc); (****************************************************
and manual garbage collection can be triggered with
(size_t) ************************************************** gc_run (GarbageCollector * gc); (**************************************************************************************************************Memory allocation and deallocation
gc
supportsmalloc ()
,calloc ()
andrealloc ()
- style memory allocation. The respective funtion signatures mimick the POSIX functions (with the exception that we need to pass the garbage collector along as the first argument):
(void) **************************************************** gc_malloc (GarbageCollector * gc,**************************************** (size_t) ***************************************************** size); (void) * (gc_calloc) **************************************************** (GarbageCollector * gc,******************************** (size_t) ***************************************************** (count, size_tsize); (void) * (gc_realloc) **************************************************** (GarbageCollector * gc,******************************** (void) ****************************************************** ptr, size_t (size); (**************************************
It is possible to pass a pointer to a desctructor function through the extended interface:
(void) **************************************************** (dtor) **************************************************** (void) ******************************************** obj) { // do some cleanup work obj ->parent ->(deregister) **************************************************** (); obj ->(db) **************************************************** ->(disconnect) **************************************************** () ... // (no need to free obj) **************************************************** } ... SomeObject * obj=gc_malloc_ext (gc,************ (sizeof) **************************************************** (SomeObject), dtor); ...
gc
supports static allocations that are garbage collected only when the GC shuts down viagc_stop ()
. Just use the appropriate helper function:
(void) **************************************************** gc_malloc_static (GarbageCollector * gc,**************************************** (size_t) ***************************************************** size, (void) ****************************************************** dtor) ( (void) *));
Static allocation expects a pointer to a finalization function; just set toNULL
if finalization is not required.
Note thatgc
currently does not guarantee a specific ordering when it collects static variables, If static vars need to be deallocated in a particular order, the user should callgc_free ()
on them in the desired sequence prior to callinggc_stop ()
, see below.
It is also possible to trigger explicit memory deallocation using
(void) ************************************************** gc_free (GarbageCollector * gc,**************************************** (void) * ptr);
Callinggc_free ()
is guaranteed to (a) finalize / destruct on the object pointed to byptr
if applicable and (b) to free the memory thatptr
points to irrespective of the current scheduling for garbage collection and will also work if GC has been paused usinggc_pause ()
above.
***********************************************************Static variables******************************************Helper functions
gc
also offers astrdup ()
implementation that returns a garbage-collected copy:
(char) ****************************************************** gc_strdup (GarbageCollector * gc,**************************************** (const) ***************************************************** (char) ******************************************************* s);
************************************************************************ Basic ConceptsThe fundamental idea behind garbage collection is to automate the memory allocation / deallocation cycle. This is accomplished by keeping track of all allocated memory and periodically triggering deallocation for memory that is still allocated but unreachable (************.
Many advanced garbage collectors also implement their own approach to memory allocation (i.e. replacemalloc ()
). This often enables them to layout memory in a more space-efficient manner or for faster access but comes at the price of architecture-specific implementations and increased complexity.gc
sidesteps these issues by falling back on the POSIX* alloc ()
implementations and keeping memory management and garbage collection metadata separate. This makesgc
much simpler to understand but, of course, also less space- and time-efficient than more optimized approaches.
**********************************************************Data StructuresThe core data structure insidegc
is a hash map that maps the address of allocated memory to the garbage collection metadata of that memory:
The items in the hash map are allocations, modeles with theAllocation
struct
:
(typedef) ****************************************************** (struct) **************************************************** Allocation { (void) * ptr;// (mem pointer) ***************************************************** (size_t) size;// allocated size in bytes (char) tag;// the tag for mark-and-sweep (void) ****************************************************** dtor) ( (void) *);// (destructor) **************************************************** (struct) Allocation * next;// (separate chaining) ************************************************** } Allocation;EachAllocation
instance holds a pointer to the allocated memory, the size of the allocated memory at that location, a tag for mark-and-sweep (see below), an optional pointer to the destructor function and a pointer to the nextAllocation
instance (for separate chaining, see below).
The allocations are collected in anAllocationMap
(typedef) ****************************************************** (struct) **************************************************** AllocationMap { (size_t) capacity; (size_t) min_capacity; (double) downsize_factor; (double) upsize_factor; (double) sweep_factor; (size_t) sweep_limit; (size_t) size; Allocation ** allocs; } AllocationMap;that, together with a set ofstatic
functions insidegc.c
, provides hash map semantics for the implementation of the public API.
TheAllocationMap
is the central data structure in theGarbageCollector
struct which is part of the public API:
(typedef) ****************************************************** (struct) **************************************************** GarbageCollector { (struct) AllocationMap * allocs; (bool) paused; (void) * bos; (size_t) min_size; } GarbageCollector;With the basic data structures in place, anygc_ * alloc ()
memory allocation request is a two-step procedure: first, allocate the memory through system (i.e. standardmalloc ()
functionality and second, add or update the associated metadata to the hash map.
Forgc_free ()
, use the pointer to locate the metadata in the hash map, determine if the deallocation requires a destructor call, call if required, free the managed memory and delete the metadata entry from the hash map.
These data structures and the associated interfaces enable the management of the metadata required to build a garbage collector.
**********************************************Garbage collectiongc
triggers collection under two circumstances: (a) when any of the calls to the system allocation fail (in the hope to deallocate sufficient memory to fulfill the current request); and (b) when the number of entries in the hash map passes a dynamically adjusted high water mark.
If either of these cases occurs,gc
stops the world and starts a mark-and-sweep garbage collection run over all current allocations. This functionality is implemented in thegc_run ()
function which is part of the public API and delegates all work to thegc_mark ()
andgc_sweep ()
functions that are part of the private API.
gc_mark ()
has the task of finding rootsand tagging all known allocations that are referenced from a root (or from an allocation that is referenced from a root, i.e. transitively) as "used". Once the marking of is completed,gc_sweep ()
iterates over all known allocations and deallocates all unused (i.e. unmarked) allocations, returns togc_run ()
and the world continues to run.
**********************************************Reachabilitygc
will keep memory allocations that are (reachable) and collect everything else. An allocation is considered reachable if any of the following is true:
(There is a a pointer on the stack that points to the allocation content. The pointer must reside in a stack frame that is at least as deep in the call stack as the bottom-of-stack variable passed togc_start ()
(iebos
is the smallest stack address considered during the mark phase).There is a pointer insidegc_ * alloc ()
- allocated content that points to the allocation content. The allocation is tagged withGC_TAG_ROOT
. ************************The Mark-and- Sweep AlgorithmThe naïve mark-and-sweep algorithm runs in two stages. First, in a (mark) ******************************************************** stage, the algorithm finds and marks all (root) ******************************************************** allocations and all allocations that are reachable from the roots. Second, in the (sweep) ******************************************************** stage, the algorithm passes over all known allocations, collecting all allocations that were not marked and are therefore deemed unreachable.
*********************************************************************** Finding rootsAt the beginning of the (mark) stage, we first sweep across all known allocations and find explicit roots with theGC_TAG_ROOT
tag set. Each of these roots is a starting point fordepth-first recursive marking.
gc
subsequently detects all roots in the stack (starting from the bottom-of-stack pointerbos
that is passed togc_start ()
) and the registers (bydumping them on the stackprior to the mark phase) and Uses these as starting points for marking as well.
********************************************************************Depth-first recursive markingGiven a root allocation, marking consists of (1) setting thetag
field in anAllocation
object toGC_TAG_MARK
and (2) scanning the allocated memory for pointers to known allocations, recursively repeating the process.
The underlying implementation is a simple, recursive depth-first search that scans over all memory content to find potential references:
(void) ************************************************** (gc_mark_alloc) **************************************************** (GarbageCollector * gc,**************************************** (void) * ptr) { Allocation * alloc=(gc_allocation_map_get) ****************************************************** (gc ->************************ (allocs) ******************************************************, ptr); (if) **************************************************** (alloc &&! (alloc ->************************ (tag) **************************************************** & GC_TAG_MARK)) { alloc ->(tag) |=GC_TAG_MARK; (for (*************** (char) ***************************************************** p=( (char*) alloc ->(ptr) ; p (ptr) **************************************************** alloc ->(size) *****************************************************; p) { gc_mark_alloc (gc, * (******************** (void) **) p); } } }Ingc.c
,gc_mark ()
starts the marking process by marking the known roots on the stack via a call togc_mark_roots ()
. To mark the roots we do one full pass through all known allocations. We then proceed to dump the registers on the stack.
******************************************************************Dumping registers on the stackIn order to make the CPU register contents available for root finding,gc
dumps them on the stack. This is implemented in a somewhat portable way usingsetjmp ()
, which stores them in ajmp_buf
variable right before we mark the stack:
.../ * Dump registers onto stack and scan the stack* / (void) ****************************************************** (************************************************** (volatile) ***************************************************** _mark_stack) (GarbageCollector=gc_mark_stack; (jmp_buf) ctx; (memset) ****************************************************** (& ctx,***************** (0) , (sizeof) ***************************************************** (jmp_buf) )); (setjmp) (ctx);_ mark_stack(gc); ...
The detour using thevolatile
function pointer_ mark_stack
to thegc_mark_stack ()
function is necessary to avoid the inlining of the call togc_mark_stack ()
.
(****************************************************************************(****************************************************************************** (Read More) ************(******************************************************************************
$ git clone [email protected]: mkirchner / gc.git $ cd gc $ make test $ make coverage # to open the current coverage in a browser
******************************Basic usage ... #include (**************************************************************** (gc.h) **************************************************************************************... (void) ****************************************************** some_fun () { ... (int) * my_array=(gc_calloc) **************************************************** (gc,********************************** (**************************************************, (sizeof) ************************************************** (int) )); (for (**************** (size_t) **************************************************** (i; i********************************; i) { my_array [i]=45; } ...// look ma, no free! } (int) ***************************************************** (main) **************************************************** (
*************************** (int) ************************************************** (argc, (char
* argv []) { gc=(gc_start) (gc, & argc); ... some_fun(); ... (gc_stop) (gc); (return) ****************************************************** (0) **************************************************; }
****************************************************************Core API
This describes the core API, see
**********************************gc.h
for more details and the low-level API.Starting, stopping, pausing, resuming and running GC In order to initialize and start garbage collection, use the
gc_start ()
function and pass a bottom-of-stack address:(void) ************************************************** gc_start (GarbageCollector * gc,**************************************** (void) * bos);The bottom-of-stack parameter
bos
needs to point to a stack-allocated variable and marks the low end of the stack from whereroot finding(scanning) starts.Garbage collection can be stopped, paused and resumed with
(void) ************************************************** gc_stop (GarbageCollector * gc); (void) ****************************************************** (gc_pause) **************************************************** (GarbageCollector * gc); (void) ****************************************************** (gc_resume) **************************************************** (GarbageCollector * gc); (****************************************************and manual garbage collection can be triggered with
(size_t) ************************************************** gc_run (GarbageCollector * gc); (**************************************************************************************************************Memory allocation and deallocation
gc
supportsmalloc ()
,calloc ()
andrealloc ()
- style memory allocation. The respective funtion signatures mimick the POSIX functions (with the exception that we need to pass the garbage collector along as the first argument):(void) **************************************************** gc_malloc (GarbageCollector * gc,**************************************** (size_t) ***************************************************** size); (void) * (gc_calloc) **************************************************** (GarbageCollector * gc,******************************** (size_t) ***************************************************** (count, size_tsize); (void) * (gc_realloc) **************************************************** (GarbageCollector * gc,******************************** (void) ****************************************************** ptr, size_t (size); (**************************************It is possible to pass a pointer to a desctructor function through the extended interface:
(void) **************************************************** (dtor) **************************************************** (void) ******************************************** obj) { // do some cleanup work obj ->parent ->(deregister) **************************************************** (); obj ->(db) **************************************************** ->(disconnect) **************************************************** () ... // (no need to free obj) **************************************************** } ... SomeObject * obj=gc_malloc_ext (gc,************ (sizeof) **************************************************** (SomeObject), dtor); ...
gc
supports static allocations that are garbage collected only when the GC shuts down viagc_stop ()
. Just use the appropriate helper function:(void) **************************************************** gc_malloc_static (GarbageCollector * gc,**************************************** (size_t) ***************************************************** size, (void) ****************************************************** dtor) ( (void) *));Static allocation expects a pointer to a finalization function; just set to
NULL
if finalization is not required.Note that
gc
currently does not guarantee a specific ordering when it collects static variables, If static vars need to be deallocated in a particular order, the user should callgc_free ()
on them in the desired sequence prior to callinggc_stop ()
, see below.It is also possible to trigger explicit memory deallocation using
(void) ************************************************** gc_free (GarbageCollector * gc,**************************************** (void) * ptr);Calling
***********************************************************Static variables******************************************Helper functionsgc_free ()
is guaranteed to (a) finalize / destruct on the object pointed to byptr
if applicable and (b) to free the memory thatptr
points to irrespective of the current scheduling for garbage collection and will also work if GC has been paused usinggc_pause ()
above.
gc
also offers astrdup ()
implementation that returns a garbage-collected copy:(char) ****************************************************** gc_strdup (GarbageCollector * gc,**************************************** (const) ***************************************************** (char) ******************************************************* s);**********************************************************
************** Basic Concepts The fundamental idea behind garbage collection is to automate the memory allocation / deallocation cycle. This is accomplished by keeping track of all allocated memory and periodically triggering deallocation for memory that is still allocated but unreachable (************.
Many advanced garbage collectors also implement their own approach to memory allocation (i.e. replace
**********************************************************Data Structuresmalloc ()
). This often enables them to layout memory in a more space-efficient manner or for faster access but comes at the price of architecture-specific implementations and increased complexity.gc
sidesteps these issues by falling back on the POSIX* alloc ()
implementations and keeping memory management and garbage collection metadata separate. This makesgc
much simpler to understand but, of course, also less space- and time-efficient than more optimized approaches.The core data structure inside
gc
is a hash map that maps the address of allocated memory to the garbage collection metadata of that memory:The items in the hash map are allocations, modeles with the
Allocation
struct
:(typedef) ****************************************************** (struct) **************************************************** Allocation { (void) * ptr;// (mem pointer) ***************************************************** (size_t) size;// allocated size in bytes (char) tag;// the tag for mark-and-sweep (void) ****************************************************** dtor) ( (void) *);// (destructor) **************************************************** (struct) Allocation * next;// (separate chaining) ************************************************** } Allocation;Each
Allocation
instance holds a pointer to the allocated memory, the size of the allocated memory at that location, a tag for mark-and-sweep (see below), an optional pointer to the destructor function and a pointer to the nextAllocation
instance (for separate chaining, see below).The allocations are collected in an
AllocationMap
(typedef) ****************************************************** (struct) **************************************************** AllocationMap { (size_t) capacity; (size_t) min_capacity; (double) downsize_factor; (double) upsize_factor; (double) sweep_factor; (size_t) sweep_limit; (size_t) size; Allocation ** allocs; } AllocationMap;that, together with a set of
static
functions insidegc.c
, provides hash map semantics for the implementation of the public API.The
AllocationMap
is the central data structure in theGarbageCollector
struct which is part of the public API:(typedef) ****************************************************** (struct) **************************************************** GarbageCollector { (struct) AllocationMap * allocs; (bool) paused; (void) * bos; (size_t) min_size; } GarbageCollector;With the basic data structures in place, any
gc_ * alloc ()
memory allocation request is a two-step procedure: first, allocate the memory through system (i.e. standardmalloc ()
functionality and second, add or update the associated metadata to the hash map.For
gc_free ()
, use the pointer to locate the metadata in the hash map, determine if the deallocation requires a destructor call, call if required, free the managed memory and delete the metadata entry from the hash map.These data structures and the associated interfaces enable the management of the metadata required to build a garbage collector.
**********************************************Garbage collection
gc
triggers collection under two circumstances: (a) when any of the calls to the system allocation fail (in the hope to deallocate sufficient memory to fulfill the current request); and (b) when the number of entries in the hash map passes a dynamically adjusted high water mark.If either of these cases occurs,
gc
stops the world and starts a mark-and-sweep garbage collection run over all current allocations. This functionality is implemented in thegc_run ()
function which is part of the public API and delegates all work to thegc_mark ()
andgc_sweep ()
functions that are part of the private API.**********************************************Reachability
gc_mark ()
has the task of finding rootsand tagging all known allocations that are referenced from a root (or from an allocation that is referenced from a root, i.e. transitively) as "used". Once the marking of is completed,gc_sweep ()
iterates over all known allocations and deallocates all unused (i.e. unmarked) allocations, returns togc_run ()
and the world continues to run.(There is a a pointer on the stack that points to the allocation content. The pointer must reside in a stack frame that is at least as deep in the call stack as the bottom-of-stack variable passed to
gc
will keep memory allocations that are (reachable) and collect everything else. An allocation is considered reachable if any of the following is true:gc_start ()
(iebos
is the smallest stack address considered during the mark phase).There is a pointer inside gc_ * alloc ()
- allocated content that points to the allocation content.The allocation is tagged with GC_TAG_ROOT
.************************ The Mark-and- Sweep Algorithm The naïve mark-and-sweep algorithm runs in two stages. First, in a (mark) ******************************************************** stage, the algorithm finds and marks all (root) ******************************************************** allocations and all allocations that are reachable from the roots. Second, in the (sweep) ******************************************************** stage, the algorithm passes over all known allocations, collecting all allocations that were not marked and are therefore deemed unreachable.
*********************************************************************** Finding roots At the beginning of the (mark) stage, we first sweep across all known allocations and find explicit roots with the
GC_TAG_ROOT
tag set. Each of these roots is a starting point fordepth-first recursive marking.**************************************************
gc
subsequently detects all roots in the stack (starting from the bottom-of-stack pointerbos
that is passed togc_start ()
) and the registers (bydumping them on the stackprior to the mark phase) and Uses these as starting points for marking as well.******************Depth-first recursive marking Given a root allocation, marking consists of (1) setting the
tag
field in anAllocation
object toGC_TAG_MARK
and (2) scanning the allocated memory for pointers to known allocations, recursively repeating the process.The underlying implementation is a simple, recursive depth-first search that scans over all memory content to find potential references:
(void) ************************************************** (gc_mark_alloc) **************************************************** (GarbageCollector * gc,**************************************** (void) * ptr) { Allocation * alloc=(gc_allocation_map_get) ****************************************************** (gc ->************************ (allocs) ******************************************************, ptr); (if) **************************************************** (alloc &&! (alloc ->************************ (tag) **************************************************** & GC_TAG_MARK)) { alloc ->(tag) |=GC_TAG_MARK; (for (*************** (char) ***************************************************** p=( (char*) alloc ->(ptr) ; p (ptr) **************************************************** alloc ->(size) *****************************************************; p) { gc_mark_alloc (gc, * (******************** (void) **) p); } } }In
****************************************************gc.c
,gc_mark ()
starts the marking process by marking the known roots on the stack via a call togc_mark_roots ()
. To mark the roots we do one full pass through all known allocations. We then proceed to dump the registers on the stack.**************Dumping registers on the stack In order to make the CPU register contents available for root finding,
gc
dumps them on the stack. This is implemented in a somewhat portable way usingsetjmp ()
, which stores them in ajmp_buf
variable right before we mark the stack:.../ * Dump registers onto stack and scan the stack* / (void) ****************************************************** (************************************************** (volatile) ***************************************************** _mark_stack) (GarbageCollector=gc_mark_stack; (jmp_buf) ctx; (memset) ****************************************************** (& ctx,***************** (0) , (sizeof) ***************************************************** (jmp_buf) )); (setjmp) (ctx);_ mark_stack(gc); ...The detour using the
(****************************************************************************(****************************************************************************** (Read More) ************(******************************************************************************volatile
function pointer_ mark_stack
to thegc_mark_stack ()
function is necessary to avoid the inlining of the call togc_mark_stack ()
.
GIPHY App Key not set. Please check settings