Debugging hidden memory leaks in Ruby, Hacker News

In 2015 I wrote about some of the tooling Ruby provides fordiagnosing managed memory leaks. The article mostly focused on the easy managed leaks.

This article covers tools and tricks you can use to attack leaks that you can not easily introspect in Ruby. In particular I will discuss mwrap, heaptrack, iseq_collector and chap.

An unmanaged memory leak

This little program leaks memory by calling malloc directly. It starts off consuming 16 MB and finishes off consuming 118 MB of RSS. The code allocates 100 k blocks of 1024 bytes and de-allocates 50 thousand of them.

require 'fiddle' require 'objspace'  def usage   rss=`ps -p # {Process.pid} -o rss -h`.strip.to_i * 1024   puts "RSS: # {rss / 1024} ObjectSpace size # {ObjectSpace.memsize_of_all / 1024} " end  def leak_memory   pointers=[]   100 _ 000     i=Fiddle.malloc (1024)     pointers

Even though our RSS is 118 MB, our Ruby object space is only aware of 3MB, introspection wise we have very little visibility of this very large memory leak.

A real world example of such a leak isdocumented by Oleg Dashevskii, it is an excellent article worth reading.

Enter Mwrap

Mwrapis a memory profiler for Ruby that keeps track of all allocations by intercepting malloc and family calls. It does so by intercepting the real calls that allocate and free memory usingLD_PRELOAD. It usesliburcufor bookkeeping and is able to keep track of allocation and de -allocation counts per call-site for both C code and Ruby. It is reasonably lightweight and will approximately double the RSS for the program being profiled and approximately halve the speed.

It differs from many other libraries in that it is very lightweight and Ruby aware. It track locations in Ruby files and is not limited to C level backtrackes valgrind masif and similar profilers show. This makes isolating actual sources of an issue much simpler.

Usage involves running an application via the mwrap wrapper, it inject the LD_PRELOAD environment and execs the Ruby binary.

Let’s append mwrap to our above script:

require 'mwrap'  def report_leaks   results=[]   Mwrap.each do | location, total, allocations, frees, age_total, max_lifespan |     resultsgrowth_a   end    results [0..20]. each do | location, growth, allocations, frees |     next if growth==0     puts "# {location} growth: # {growth.to_i} allocs / frees (# {allocations} / # {frees})"   end end  GC.start Mwrap.clear  leak_memory  GC.start  # Don't track allocations for this block Mwrap.quiet do   report_leaks end

Next we will launch our script with the mwrap wrapper

% gem install mwrap % mwrap ruby leak.rb leak.rb: 12 growth: 51200000 allocs / frees (100000 / 50000) leak.rb: 51 growth: 4008 allocs / frees (1/0)

Mwrap correctly detected the leak in the above script (50, 000 * 1024 ). Not only it detected it, it isolated the actual line in the script (i=Fiddle.malloc (1024)) which caused the leak. It correctly accounted for theFiddle.freecalls.

It is important to note we are dealing with estimates here, mwrap keeps track oftotalmemory allocated at the call-site and then keeps track of de-allocations. However, if you have a single call-site that is allocating memory blocks of different sizes the results can be skewed, we have access to the estimate:((total / allocations) * (allocations - frees))

Additionally, to make tracking down leaks easier mwrap keeps track ofage_totalwhich is the sum of the lifespans of every object that was freed, andmax_lifespanwhich is the lifespan of the oldest object in the call-site. Ifage_total / freesis high, it means the memory growth survives many garbage collections.

Mwrap has a few helpers that can help you reduce noise.Mwrap.clearwill clear all the internal storage.Mwrap.quiet {}will suppress Mwrap tracking for a block of code. *********

Another neat feature Mwrap has is that it keeps track of total allocated bytes and total freed bytes. If we remove the clear from our script and run:

Usage puts "Tracked size: # {(Mwrap.total_bytes_allocated - Mwrap.total_bytes_freed) / 1024} "  # RSS: 130804 ObjectSpace size 3032 # Tracked size: 91691

This is very interesting cause even though our RSS is 130 MB, Mwrap is only seeing 91 MB, this demonstrates we have bloated our process. Running without mwrap shows that the process would normally be 118 MB so in this simple case accounting is a mere 12 MB, the pattern of allocation / deallocation caused fragmentation. Knowing about fragmentation can be quite powerful, in some cases with untuned glibc malloc processes can fragment so much that a very large amount memory consumed in RSS is actually free.

Could Mwrap isolate the old redcarpet leak?

InOleg’s articlehe discussed a very thorough way he isolated a very subtle leak in redcarpet. There is lots of detail there. It is critical that youhave instrumentation. If you are not graphing process RSS you have very little chance at attacking any memory leak.

Let’s step into a time machine and demonstrate how much easier it can be to use Mwrap for such leaks.

def red_carpet_leak   100 _ 000      markdown=Redcarpet :: Markdown.new (Redcarpet :: Render :: HTML, extensions={})     markdown.render ("hi")   end end  GC.start Mwrap.clear  red_carpet_leak  GC.start  # Don't track allocations for this block Mwrap.quiet do   report_leaks end

Redcarpet version 3.3.2

redcarpet.rb: 51 growth: 22724224 allocs / frees 500048 / 400028) redcarpet.rb: 62 growth: 4008 allocs / frees (1/0) redcarpet.rb: 52 growth: 634 allocs / frees 600007 / 600000)

Redcarpet version 3.5.0

redcarpet.rb: 51 growth: 4433 allocs / frees 600045 / 600022) redcarpet.rb: 52 growth: 453 allocs / frees (600005 / 600000)

Provided you can afford for a process to run at half speed simply re-launching it in production with Mwrap and logging Mwrap output once in a while to a file can identify a broad spectrum of memory leaks.

A mysterious memory leak

Recently we upgraded Rails to version 6 at Discourse. Overall the experience was extremely positive, performance remained more or less the same, Rails 6 includes some very nice features we get to use (likeZeitwerk).

Rails amended the way templates are rendered which required a few backwards compatible changes.

Fast forward a few days after our upgrade and we noticed RSS for our Sidekiq job runner was climbing.

Mwrap kept on reporting a sharp incline in memory due to memory being allocated at:

We initially found this very confusing and kept thinking to ourselves, why is Mwrap complaining? Could it be broken?

During the period where memory was climbing the Ruby heaps were not growing in size in a significant manner.

2 million slots in the heap are a meager (MB) 40 bytes per slot), strings and arrays can take up more space, but this simply did not explain the enormous memory usage we were seeing. This was confirmed when I ranrbtrace -p SIDEKIQ_PID -e ObjectSpace.memsize_of_all.

Where did all the memory go?

Heaptrack

Heaptrackis a memory heap profiler for Linux.

Milian Wolff does a great job explaining what it is and how it came to beon his blog. He also has several talks about it (1,2,(3))

In essence it is an incredibly efficient native heap profiler that gathers backtraces from a profiled applications usinglibunwind.

It is significantly faster thanValgrind / Massifand has a feature that makes is much more suitable fortemporaryproduction profiling.

It can attach to an already running process!

As with most heap profilers, when every single malloc family function is called it needs to do some accounting. This accounting certainly slows down the process a bit.

The design, in my mind, is the best possible design for this type of program. It intercepts using an LD_PRELOAD trick or aGDB trickto load up the profiler. It ships the data out of the profiled process as quickly as possibly using aFIFO special file. The wrapperheaptrackis a simple shell script, something that makes troubleshooting simple. A second process runs to read from the FIFO and compress the tracking data on the fly. Since heaptrack operates in “chunks” you can start looking at the profiled information seconds after you start profiling, mid way through a profiling session. Simply copy the profile file to another location and run the heaptrack gui.

Thisticket at GitLabalerted me to the possibility of running heaptrack. Since they were able to run it, I knew it was a possibility for me.

We run our application in a container, I needed to relaunch our container with- cap-add=SYS_PTRACEwhich allows GDB to useptracewhich we needed so heaptrack can inject itself. Additionally, I needed asmall hackon the shell file to allowrootto profile a nonroot (process (we run our Discourse application under a restricted account in the container).

Once this was done it was as simple as runningheaptrack -p PIDand waiting for results to stream in.

The UX of heaptrack is fantastic and extremely rich, it was very easy to follow what was happening with my memory leak.

At a top level I could see two jumps, one was due tocppjiebaand the other was originating from Rubyobjspace_xmalloc0

I knew aboutcppjieba, segmenting Chinese is expensive, large dictionaries are needed, it was not leaking.

But why was ruby allocating memory and further more, not telling me about it?

The largest increase was coming fromiseq_set_sequenceincompile.c. So it follows that we were leaking instruction sequences.

This made the leak Mwrap detected make sense.mod.module_eval (source, identifier, 0)was causing a leak cause it was creating instruction sequences that were never being removed.

In retrospect if I carefully analyzed a heap dump from Ruby I should have seen all these IMEMOs, cause they are included in heap dumps, just invisible from in-process introspection.

From here on debugging was pretty simple, I tracked down all calls to the module eval and dumped out what it was evaluating. I discovered we kept on appending methods over and over to a big class.

Simplified, this is the bug we were seeing:

require 'securerandom' module BigModule; end  def leak_methods   10 _ 000 times     method="def _ # {SecureRandom.hex}; # {" sleep; "** 100}; end "     BigModule.module_eval (method)   end end  usage # RSS:  (ObjectSpace size)   leak_methods  usage # RSS: 123096 ObjectSpace size 5583

Ruby has a class to contain instruction sequences called:RubyVM :: InstructionSequence. However, Ruby is lazy about creating these wrapping objects, cause it is inefficient to have them around unless needed.

InterestinglyKoichi Sasadacreated theiseq_collectorgem. If we add this snippet we can now find our hidden memory:

require 'iseq_collector' puts "# {ObjectSpace.memsize_of_all_iseq / 1024} " # 98747

ObjectSpace.memsize_of_all_iseqwill materialize every instruction sequence, which can introduce slight process memory growth and slightly more GC work.

If we, for example, count the number of ISEQs before and after running the collector we will notice that after runningObjectSpace.memsize_of_all_iseqourRubyVM :: InstructionSequenceclass count grows from 0 to 11128 in the example above:

def count_iseqs   ObjectSpace.each_object (RubyVM :: InstructionSequence) .count end

These wrappers will stay around for the life of a method and need to be visited when a full GC runs.

For those curious, our fix to our issue was reusing the class responsible for rendering email templates. (fix 1,fix 2)

Chap

During my debugging I came across a very interesting tool.

Tim Boddy, extracted an internal tool used at VMWare for analysis of memory leaks and open sourced it a few years ago. The only video I can find about itis here. Unlike most tools out there this tool has zero impact on a running process. It can simply run against core dump files, as long as the allocator being used is glibc (no support for jemalloc / tcmalloc etc)

The initial leak I had can be very easily detected using chap. Not many distros include a binary for chap, but you can easilybuild it from source. It is very actively maintained.

# 444098 is the `Process.pid` of the leaking process I had sudo gcore -p 444098  chap core. 444098 chap>summarize leaked Unsigned allocations have 49974 instances taking 0x  (f1b0) 51, 573, 168) bytes.    Unsigned allocations of size 0x 408 have 49974 instances taking 0x  (f1b0) 51, 573, 168) bytes.  (allocations use 0x) ******************************************************************************************************************************************** (f1b0) 51, 573, 168) bytes.  chap>list leaked ... Used allocation at 562 CA 267 cdb0 of size 408 Used allocation at 562 CA 267 d1c0 of size 408 Used allocation at 562 CA 267 d5d0 of size 408 ...   chap>summarize anchored .... Signature 7fbe 5caa 0500 has 1 instances taking 0xc8 (200) bytes.  (allocations use 0x2ad)  (44, 922, 112) bytes.

Chap can use signatures to find where various memory is allocated and can complement GDB. When it comes to debugging Ruby it can do a great job helping you finding out what the actual memory is in use for a process.summarize usedgives the actual memory, sometimes glibc malloc can fragment so much that theusednumber is enormously different to the actual RSS. See:Feature # 14759: [PATCH] set M_ARENA_MAX for glibc malloc – Ruby master – Ruby Issue Tracking Systemfor more discussion. Chap can correctly account for all memory usage and provide deep analysis around memory allocation behaviors.

Additionally chap can be integrated into build pipelines to automatically detect leaks and flag builds that are leaking.

Future work

This round of debugging did prompt me to raise a few issues with our supporting tool-sets:

I would love to see heaptrack support grabbing call-stack frames from Ruby. Milian is interested in this problem as well:412929 – Can we grab a frame from Ruby land?
I would love Ruby to support richer introspection:https://bugs.ruby-lang.org/issues/ 16245
I would love to see Mwrap be a little bit easier to use in production.Tracked here.

Summary

Our existing tooling for debugging very complex memory leaks in 2019 is vastly superior to what we had 4 years ago! Mwrap, heaptrack and chap provide us with very powerful tools for attacking memory related issues both in development and production.

If you are hunting a simple memory leak in Ruby, I recommendmy earlier articlefrom 2015, most of it still holds.

I hope that next time you are stuck debugging a complex native memory leak you have an easier time!

If you have any interesting battle stories or tools I have forgotten to mention you would like to share, please post a comment!

(Read More)