Late Safepoint Placement Overview

In a previous post, I sketched out some of the problems with the existing gcroot mechanism for garbage collection in LLVM.  This post is going to layout the general approach of what we’ve started referring to as “late safepoint placement.”  This will be both fairly high level and fairly short.  Details will follow in future articles.

The general approach we’ve taken is to partition LLVM’s optimization and code generation process into two distinct phases.  Between the two phases, we rewrite the IR to contain explicit safepoints – constructed in a way which conservatively encodes their relocation semantics.

The first phase runs before safepoints are inserted.  When safepoints are inserted, we require that a couple key invariants have been upheld.  We construct the initial IR such that these invariants hold.  Each pass in the first phase must preserve them.  Somewhat surprisingly, most existing optimization passes seem to preserve them without modification. The key ones are:

  • pointers must remain pointers
  • pointers into the garbage collected heap must be distinguishable from pointers outside the garbage collected heap
  • a base pointer must be available (at runtime) for every derived pointer

Together, these give us all the information we need to insert safepoints.  There will be a future article which will focus on this in more depth.

The actual insertion of the safepoints is a fairly complex set of IR transform passes.  The objective of these passes is to represent the inserted safepoints in a way that it would be illegal – using LLVM’s own semantics – to transform the IR in a way which subverts the desired safepoint semantics.  We plan on contributing these transform passes to LLVM community.  There will also be a future post in this series which discusses some of the steps involved and the algorithms used.

Once this transformation is complete, we can run the resulting IR through any remaining optimization passes and backend code generation without concern for correctness.  Nor do we need to extend the entire process to preserve the invariants mentioned above.  (Since the SelectionDAG completely throws away the distinction between pointers and integers, that’s pretty much a hard requirement for a practical system.)  As a result, the second phase consists of any optimization and code generation steps which were not placed into phase 1.

Note: The bit we’re skipping for the moment is how to construct the IR for a safepoint and how to propagate that through all of code generation.  That does require some additions to LLVM and will be a separate article in the near future.  For the moment, you’ll have to just accept it is possible.

What makes this approach very powerful is that the boundary between the two phases is adjustable.  We can trade implementation effort directly for generated code quality by pushing the boundary further back into optimization and (someday) code generation.

A naive implementation could use an empty first phase and insert safepoints before running any optimization passes.  This would be analogous to the “early safepoint placement” scheme I mentioned in the previous post.  On the other extreme, you could pull all of optimization and code generation into the first phase, thus getting a classic garbage collection aware compiler.  At the moment, this is somewhat impractical since we can’t preserve the required invariants that late in the compilation process, but it highlights an interesting direction for future work.

At the moment, we’ve chosen to place the safepoint insertion step immediately after the target independent optimization passes and right before we begin lowering towards the specific machine model.  (i.e. after high level optimization such as constant propagation, gvn, loop optimizations, etc.., but immediately before CodeGenPrepare)  We think that we’ve managed to push the required invariants this far. Though to be fair, we haven’t yet had serious burn in on the prototype; we may find an insurmountable bug and have to pull this slightly earlier.

One advantage of this approach which can’t be understated is the flexibility it allows.  Combined with LLVM’s existing pass scheduling mechanism, we can place *any* problematic pass after safepoint insertion.  This both gives us a means to work around bugs in the short term, and also gives us a means to work incrementally towards a fully GC aware compiler.

Note: The flexibility in pass scheduling does come at some cost.  Moving a pass out of its expected order may reduce optimization effectiveness.  On one hand, the moved pass may not be as effective once safepoints are inserted.  On the other, pass ordering is a well known problem in compilers and moving a pass may decrease the effectiveness of other passes (even those not moved).

I believe that “late safepoint placement” is a viable path towards high performance fully relocating garbage collection in LLVM.  We implemented enough of this to be reasonably confident it actually works.  Over the next few weeks, I will be devoting more of my time to describing our approach publicly and preparing changes for upstream contribution.  Check back here over the next few weeks for updates.

Aside: It is not clear that we will ever reach the goal of what I’ve termed a “fully GC-aware compiler” above.  Pushing safepoint insertion further back in the process would require substantial changes to large pieces of the backend infrastructure.  It’s not even clear that doing so would be sufficiently profitable to justify the effort.  We believe that the currently placement of safepoint insertion will be adequate from the perspective of code quality.  There’s room for improvement, but it may not be worth the engineering investment or maintenance costs.

5 Comments

  1. I’m implementing Common Lisp with llvm as the back end. I’m incorporating the Memory Pool System garbage collector from Ravenbrook (a mostly copying gc, conservative on the stack, precise on the heap) I’m interested in precise gc on the stack and I’m not sure that safe points will be adequate/compatible with the MPS because it does gc work on almost any memory access. I’m curious about your thoughts on whether the MPS library would be compatible with your approach to precise GC. I’m very appreciative of any work on llvm. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>