Quick thoughts on WebKit’s B3

I just finished reading Introducing the B3 JIT Compiler and wanted to jot down some thoughts.  Fair warning, this post is being written in a hurry.  I’m focusing on getting down my initial reactions rather than trying for a really well thought out post.  That may follow at a later time or it may not.

The first key bit is that the goal of this effort appears to be strictly compile time, not peak performance.  Understanding that makes the effort make a lot more sense.  It’s still seems a bit odd to me for the compile time of your *fourth tier* JIT to be that important, but given I’m no expert in JavaScript, I’ll just accept that as a given.

In that context, I find the claimed 4.7x reduction in compile time surprisingly unexciting.  There’s enough low hanging fruit in LLVM – in particular, a better representation for “check” nodes – that I would expect something on that magnitude being possible within the existing framework they had.  Achieving a ~5 improvement of compile time with an entirely new compiler (and all of the engineering that implies), seems slightly disappointing.  Now it’s possible (heck, even likely!) that the new architecture will allow them to further drop compile time, but still…

The performance numbers quoted were uninformative at best.  From what I can gather in the write up, the JetStream benchmark is highly influenced by compile time.  While I understand the goal (it’s a useful one), it doesn’t really say anything about the peak performance of the code generated by the two compilers.  Given that, it’s really hard to tell if B3 is actually breakeven with the existing LLVM backend at peak. It have been really nice to see the same numbers with the compile time somehow removed or adjusted for.  (b3 with a sleep to be slower?  A longer warmup period in a modified benchmark?)

Just to be clear, I’m not saying that the numbers presented are “wrong”.  Merely that they don’t answer the question I’d most like them to. 🙂

Other things that jumped out at me:

  • The points about matching the IR to the source language in an effort to reduce the number of nodes (and thus memory, and time) are spot on.  If what you’re going for is compile time above all else, using an IR which closely matches your source language is absolutely the right approach.  This same general idea (remove memory/nodes where they don’t provide enough value) is what’s motivating the removal of pointer-to-pointer bitcasts in LLVM right now.
  • The emphasis on the importance of the “check” node (i.e. early OSR exit if condition fails) matches our experience as well.  You can also see this in Swift’s IR as well.  There is clearly an area that LLVM needs to improve.  I think we can do a lot better within LLVM, and I’m surprised they didn’t try to push that.  In particular, the aliasing problems mentioned could have been addressed with a custom AliasAnalysis instance.  Oh well.
  • The choice to use arrays (instead of lists) gives some interesting tradeoffs.  From a compile time perspective, a modify and compact scheme is likely a win.  Interestingly, this reminds me a lot of a mark-compact garbage collector (b3’s layout) vs a standard malloc/free allocator (llvm’s layout).  Given typical lifetimes in a compiler (short!), the collector approach is likely to be the right one.  It does raise some interesting challenges though: pointer equality can no longer be used, trivial dead code elimination is no longer trivial (which complicates various other transforms), and transforms have to deal with non-canonical forms due to extra identify nodes.  It’ll be really interesting to see where b3 goes on this basis alone.