in ,

Benchmarking C ++ Allocators, Hacker News

By Jeff Baker, April 2020

C allocator implementation can be crucial to C application performance. There are many blogs describing the benefits of using jemalloc or tcmalloc or hoard , rather than system allocators like ptmalloc on GNU / Linux. All of these publications share the same flaws:

  1. They use the dynamic linker to replace malloc and free, and
  2. They refer to the obsolete gperftools distribution of tcmalloc.
  3. An example of both is the widely linked

  4. Percona blog post comparing tcmalloc, jemalloc, and ptmalloc. It shows essentially that ptmalloc falls apart at high parallelism, and that jemalloc and tcmalloc are about the same.

For C programs, replacing malloc and free at runtime is the worst choice. When the compiler can see the definition of new and delete at build time it can generate far better programs. When it can’t see them, it generates out-of-line function calls to malloc for every operator new, which is bananas.

Another thing to keep in mind is that the developers of tcmalloc never use it via dynamic preload. They only use it via bazel’s malloc option, which builds the program with the designated allocator. Consequently they don’t have any motivation to improve tcmalloc’s performance and behaviors as a malloc / free library. They are focused on using it as a build-time C allocator, and all their work on tcmalloc is guided by its performance in that role.

(A more recent blog from IT Hare

still falls victim to both # 1 and # 2, but since their code is on Github we can fix it. By properly building their benchmark with modern tcmalloc, we can see how much C new / delete performance can be improved. Figures are milliseconds to complete the entire benchmark run. System is a 7th-generation Intel Core CPU with 8 threads on 4 cores.

Threads

jemalloc

gperftools

tcmalloc

1

(ms

629

  • ()

    2

    (

    662

    4

    ()

    546

    8

    (2013

    1588025929690000

  • (

    By using tcmalloc with runtime dynamic loading, we leave a lot of potential performance on the table. The benchmark is dramatically faster when built with tcmalloc.


    (Read More ) Brave Browser Full coverage and live updates on the Coronavirus (Covid-) )

  • What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    Ghost of Tsushima Shafted by The Last of Us Part II Release Date Shift, Crypto Coins News

    Ghost of Tsushima Shafted by The Last of Us Part II Release Date Shift, Crypto Coins News

    BuildZoom – Full Stack Engineer (Staff / Principal), Hacker News

    BuildZoom – Full Stack Engineer (Staff / Principal), Hacker News