Deep Distributed Tracing with OpenTracing and the JDK Flight Recorder

Recently I had a talk at Code One about using OpenTracing together with the JDK Flight Recorder to do deep tracing. Since the session wasn’t recorded, I though I’d do a blog about it instead. Here we go…

Distributed tracing has been of interest for a very long time. Multiple companies have sprung up around the idea over the years, and most APM (Application Performance Management) solutions are built around the idea. Google released a paper around their large scale distributed systems tracing infrastructure in 2010 – Dapper, and there are now several open source alternatives for distributed tracing available inspired by the paper, such as Jaeger and Zipkin.

In Java land, pretty much all of the APMs are doing pretty much the same thing: they use BCI (byte code instrumentation) for getting the data, and then they present that data to the end-user in various ways, oftentimes using some kind of analysis to recognize common problems and suggesting solutions to the end users of the APM. The real differentiation is knowing what data to get, and what to do with the data once captured.

Since there was no standard, one problem was for vendors to inject helpful, vendor specific, information into the distributed traces. The vendor of a software component may have a quite good idea about what information would be helpful to solve problems. Some vendors support APM specific APIs for contributing the data, but more often than not the instrumentation is done using BCI by scores of developers working for the various APM companies. The same is true for maintainers of open source components – either skip the problem entirely and let the APM vendors come up with good instrumentations points (if your component is popular enough), or pick a popular APM and integrate with it. That is, until OpenTracing came along…

Introduction to OpenTracing

OpenTracing is an open source, vendor neutral, distributed tracing API. In other words, library developers can interact with one API to support multiple APM/Tracer vendors. Also, customers can add contextual information to distributed traces without worrying about vendor lock-in. Contributors to OpenTracing include LightStep, Jaeger, Skywalking and Datadog, and the specification is available on GitHub:

https://github.com/opentracing

The core API concepts in OpenTracing are (from the slides of my talk, DEV5435):


Trace


– A distributed operation, potentially spanning multiple processes


– Implicitly defined by the individual Spans in the trace (more soon)


– Can be thought of as a directed acyclic graph (DAG) of Spans


– The span in the root of the DAG is called the root Span


– The edges between the Spans are called References


Span


– Has an operation name


– Has a start timestamp


– Has a finish timestamp


– Has a SpanContext


• Has Baggage Items (key/value pairs which cross process boundaries)


• Implementation specific state used to identify the span across process boundaries)


– Zero or more key/value Span Tags


– Zero or more Span Logs (key/value + timestamp)


Reference


– Defines a direct casual relationship between two spans


– ChildOf


• Parent depends on the child in some way


• Note that it is legal for a finish timestamp of a child to be after that of any parent


– FollowFrom


• Parent does not depend on the result of the child in any way


• Note that it is legal for a FollowsFrom child to be started after the end of any ancestor

Also worth noting is that a Scope is a thread local activation of a span.

The Example

As an example, we’ll be using a simple application consisting of three microservices. It is part of the back-end of a fictional robot store. Robots can be ordered at the Orders service, and they will be produced in a Factory. There is also a Customers service keeping track of the customers. Finally there is a load generator that can be used to exercise the services.

image 

The code is available under https://github.com/thegreystone/problematic-microservices. (Yes, as the name indicates, the services come pre-packaged with built-in problems. :))

The services, as well as the load generator, have built-in tracing support, so for a full systems run with the load generator, you would get a trace (a DAG of spans), looking something along the lines of:

image

Or, in Jaeger, where you have time on the X axis:

tracing

In this case I have scrolled down a bit to focus on the factory. As can be seen, there is great variability in the time it takes to create a chassis and/or paint a robot. We have multiple production lanes, and we’d expect times across the factory lanes to be more even, not to mention much faster. So what gives?

Well, we can expand the operation to see if there was some additional information:

tracing_details

Now, sometimes the tags may include crucial pieces information that may help you solve the problem without needing any additional information. In this particular case, though, knowing that we were building a pink BB-8 isn’t really doing the trick.

What would be the next step? All too often the next step would be to look at the code around the instrumentation point, trying to figure out what was going on at the time simply from analyzing the code. Sometimes that may be quite hard. The problem may be in third party code not expected to behave badly. There may even be some other piece of code not directly in the code path causing the problems, perhaps an agent misbehaving and causing long lasting safe points in the JVM.

So, we’re screwed then? Nah. What if you had a magic tool that could record what was going on in the JVM and the application at the time of the incident? Something providing not only method profiling information, but a deeper view, including information about vm operations, memory allocation profiling, events for the usual application caused thread halts and much, much more. Something that could be always on, with very low overhead. And let’s say you ran with a tracer that added some contextual information, such as information that could be used to identify traces, spans and thread local span activations in the recorded data, and which allowed you to use your favourite tracer too? Then things would get interesting indeed…

Running with the JFR Tracer

For Code One I wrote a little delegating JFR tracer, which allows you to record contextual information into the flight recorder. It was meant as an example on how to do deep distributed tracing. Deep enough to solve entire classes of problems that are hard to solve without more detailed knowledge.

The tracer works with Oracle JDK 7+ and OpenJDK 11+ (it is a multi-release jar, a.k.a. mrjar), and the source is available on GitHub here:

https://github.com/thegreystone/jfr-tracer

The bundle is available from Maven Central, and here is the dependency you need to add:

<dependency>
<groupId>se.hirt.jmc</groupId>
<artifactId>jfr-tracer</artifactId>
<version>0.0.3</version>
</dependency>

Next you need to initiate your tracer and pass it to the constructor of the DelegatingJfrTracer, like so:

GlobalTracer.register(new DelegatingJfrTracer(yourFavTracer));

That’s it. When the tracer is running you will get contextual information recorded into the flight recorder.

Looking at the Recording

Dumping the flight recorder for the factory, and looking at the dump in the Threads view, might look something like this:

image

We can see that we have these long lasting monitor enter (Java Blocking) events, and looking at the stack traces directly by selecting individual events, or at the Lock Instances page, it is fairly obvious where the contention is:

image

We can, of course, create a custom OpenTracing view to make it easier to directly finding and homing in on long lasting traces (I’ll create a repo for a ready made one with some more flair at some point). Simply go the the Event Browser, and right click on the Open Tracing folder. Select “Create a new page using the selected event types”. You will now have a new page in the Outline. You can right click on the title on the page to rename it and switch icon.

Next select an arbitrary event, and right click on it. Select Group-By->Trace Id. In the new Group By table that appeared, select Visible Columns to enable (at least) the attribute showing the longest duration (the total duration of (wall clock) time the trace spent in the process that the recording came from). Next sort on the Longest Duration column.

In this case I’ve ran a few more (press enter in the single step load generator a few times, or let it just continuously add load):

image

You can, of course add additional tables with groupings that can be useful, for example, per thread. To quickly home in the entire user interface on a trace id of interest, just select a trace and choose “Store and Set as Focused Selection”:

SNAGHTML38aee12

Now you can go back to, for example, the Threads view, and click the Time Range: Set button in the upper right corner. Voila, you are in exactly the right place. You may also want to view concurrently occurring events in the same threads (see check boxes on top), and enable additional thread lanes:

SNAGHTML38f17fd

Summary

  • Distributed tracing is great, especially in today’s world of (very µ and plenty) µ-services.
  • For the Java platform, injecting trace/span-identifying information as contextual information into the JDK Flight Recorder is dynamite.
  • A simple example on how to do this automagically is available on my GitHub as a delegating Tracer, in an mrjar, supporting Oracle JDK 7+ and OpenJDK 11+:
        https://github.com/thegreystone/jfr-tracer
  • The slides for my Code One presentations can be found here:
        https://oracle.rainfocus.com/widget/oracle/oow18/catalogcodeone18?search=hirt
    (The relevant session for this blog is DEV5435.)
  • The JDK Flight Recorder r0xx0rz.
  • JDK Mission Control r0xx0rz.

Note that since the article was written, I have donated the tracer to OpenTracing.
See https://github.com/opentracing-contrib/java-jfr-tracer.

Solving Memory Leaks without Heap Dumps

Sometimes you may not want to do a heap dump. You may be running in an environment which is sensitive to latencies. Or you may be forbidden to create heap dumps, since the content will contain all your customer information and all of your organization’s account numbers, and if the dump ended up in the wrong hands, your entire business would be done for. Or you may have an 800+GB heap (yes, some customers run Java with enormous heaps with great success). And even worse, you may have a huge heap, with a relatively small ephemeral disk storage, not even able to store your huge heap dump. And, quite frankly, even if you get your 800+GB heap dump to your puny laptop, how will you open it? How much time will it take to calculate a dominator tree over that dump?

No matter the reason for you not wanting to do a heap dump, there is now (well, since JDK 10 really), a new JFR event allowing you to solve memory leaks without having to do full heap dumps with very little overhead. Black magic you say? Yes, awesome, yummy, black magic.

The Old Object Sample Event

At the heart of the red pentagram (with a black wax candle on each point and encircled with salt) is the Old Object Sample event. It was introduced in JDK 10. It basically tracks a fixed number of objects on the heap, for as long as they are live. To not incur massive overhead, they are selected in a similar way that the allocation event samples are picked – upon retiring a TLAB, or when allocating outside of TLABs. So, a sampled subset of the allocations get tracked.

When a sample is chosen, the allocation time gets stored together with the allocation stack trace, the thread id, the type of object being allocated, and the memory address of the object. If it’s an array, we also record the array size,

The samples are then stored in a fixed size (256 by default) combined priority queue/linked list, with weak references to the samples. If sampled objects are garbage collected, they are removed and the priority redistributed to the neighbours. The priority is called span, and is currently the size of the allocation, giving more weight to larger (therefore more severe) leaks.

Once the recording is dumped, the paths back to the GC roots can be calculated. I write can, since this is optional – it is something that must be enabled in the recording, or as a parameter to e.g. jcmd when dumping the recording. If the reference chain is very deep (>256 object references), the reference chain will be truncated. It is also possible to specify a time budget, so that the time searching references can be limited. For example, imagine a linked list occupying most of the heap, and the sampled object being the tail of that list. The reference chain for that tail sample would span almost the entirety of the heap. With a large time budget, you would still get a truncated sample. If you don’t want to spend so much time searching the heap, you could limit the time budget.

In other words, the Old Object Sample event contains a lot of exciting information:

  • Time of allocation
  • The thread doing the allocation
  • The last known heap usage at the time of allocation
    (Which can be used to plot the live set, even if we don’t have data from the time of allocation anymore.)
  • The allocation stack trace
    (In other words, where was this object allocated?)
  • The reference chain back to the GC root at the time of dumping the recording
    (In other words, who is still holding on to this object?)
  • The address of the object

There is some additional information. You can check out IMCOldObject in the OpenJDK JMC project source for more details.

Here is an Old Object Sample event shown in the JMC 7 Properties view:

image

Using the Old Object Sample Event

The best way to use the Old Object Sample event is to use it in a long running application. The longer the better. Statistically speaking, you want to offer as many chances as possible for a leaked object to end up being sampled. You’d also want to be well beyond the loading of all your code. Also, you would want to have been running long enough to be sure that transients have been cleared out. For example, if you have a session time-out of some kind set to 2 hours, and a ginormous application server and even larger application taking 15 minutes to start, then the first 2 hours and 15 minutes of runtime will not be that exciting from a memory leak hunting perspective.

A simple way of using the event is to simply go look for events still around after the warmup phase, but before transient objects could reasonably still be around. An even simpler rule of thumb – look at the ones allocated in the middle of the time span. Winking smile

image

Since there is currently a bug open on JMC 7 (JMC 7 has not been released yet; we hope to fix it before we release), “picking the middle” is not yet possible. That said, in the picture above we can see that most live objects being tracked are actually held on to by the Leak$DemoThread, which has a Hashtable (what can I say, it’s a really old example program), having an entry array, containing an entry holding on to a Leak$DemoObject which in turn holds on to a leaked char[].

Now, JMC has a more sophisticated algorithm for selecting good candidates than “go for the ones in the middle”. It first check if we have an increasing live set. If so, and if we have Old Object Sample information, we will try to find good candidates using a combination of the distance from the root, the ratio of how many objects this candidate keeps alive to how many objects its root keeps alive and the ratio of how many objects the candidate keeps alive to how many objects are alive globally. For more information, check out the ReferenceTreeModel in the JMC project.

This has already become a much longer post than I was planning on. Anyways, if you want to experiment a bit with the Old Object Sample event, I have an upcoming JMC and JFR Tutorial that I am planning on “releasing” when JMC 7 is out. That said, you can already beta test it. There is some more information in the blog entry prior to this one.

The Practical Guide to the Old Object Sample Event

If you use the continuous template, this is recorded:

  • Timestamp
  • Thread
  • Object Type

If you use the profile template, this is recorded:

  • Timestamp
  • Thread
  • Object Type
  • Allocation stack trace

If you ask for paths-to-gc-roots you also get the reference chains. This can be done by:

  • Adding it as a parameter on the command line:
    -XX:StartFlightRecording=path-to-gc-roots=true
  • By asking for it when dumping the flight recorder, for example using jcmd:
    jcmd <pid> JFR.dump path-to-gc-roots=true

You can also configure the number of objects to track by setting the old-object-queue-size in the flight recording options, for example:

-XX:FlightRecordingOptions=old-object-queue-size=256

If you want to configure the cutoff for how long to search for references, that can be done in the template file, for example, these are the default settings in the profile template (JDK_HOME/lib/jfr/profile.jfc):

    <event name="jdk.OldObjectSample">
      <setting name="enabled" control="memory-leak-detection-enabled">true</setting>
      <setting name="stackTrace" control="memory-leak-detection-stack-trace">true</setting>
      <setting name="cutoff" control="memory-leak-detection-cutoff">0 ns</setting>
    </event>

Summary

  • The Old Object Sample event is awesome
  • It can, among other things, be used to hunt down memory leaks without doing hprof heap dumps
  • It will also bring you luck, good fortune, not to mention smells good

Sneak Peek of JDK Mission Control 7 Tutorial

Even though JMC 7 is not GA yet, I thought I’d make the upcoming JMC Tutorial available on my GitHub. Hopefully this will be a good resource to help to learn more about using Mission Control 7 and Flight Recorder in OpenJDK 11.

It does takes a bit of preparation to run it for now:

  • JDK Mission Control will need to be built from source, since there are no update sites available yet
  • JOverflow will not work until JMC-6121 is solved
  • Exercise 5 will be better once JMC-6127 is solved

That said, all the preparations needed are listed in the README.md file in the GitHub repo:

https://github.com/thegreystone/jmc-tutorial

Please let me know if something is missing from the instructions!