Contributing to OpenJDK Mission Control

Hacktoberfest Banner

Since this month is Hacktoberfest, I thought it would be a good idea to talk a bit about how to contribute to the OpenJDK Mission Control project. Some of the content of this blog post will be applicable to any of the OpenJDK projects, especially the Skara (OpenJDK on Git) bits.

The OpenJDK Mission Control Project

The OpenJDK Mission Control project is the observability tools suite for OpenJDK. It contains a JMX Console, a JFR visualizer and analyzer, a heap waste analysis tool, and many other little useful tools and utilities. Since it is all open source, pretty much anyone can contribute to the project.

The project is on GitHub:
https://github.com/openjdk/jmc

The first step to contribute to JDK Mission Control is to simply fork the repository on GitHub. This establishes a copy of the repository where you can freely make changes as you please. Whilst it is technically possible to make the changes in the master branch, it will save time and effort if you later want to contribute the effort to make the changes in a branch:
git checkout -b my-jmc-test

Building JMC

First of all, ensure that you have jdk11 active in your shell, and verify that this is the case using:
java -version

There are multiple ways to build JMC. The easiest way is to simply use the build script (don’t do this just yet):
./build.sh –packageJmc

There is also a way to build JMC using Docker (don’t do this just yet either):
docker-compose -f docker/docker-compose.yml run jmc

These are however not the best ways when you’re developing JMC using an IDE. The third party dependencies for JMC need to be available through a p2 repository, and you want to install a build of the JMC core libraries into your maven cache.

So, to set things properly up for development, it is better to first install the core libraries:

cd $JMC_ROOT/core
mvn install

Next, build the p2 site and start jetty to expose it on a well known port:

cd $JMC_ROOT/releng/third-party
mvn p2:site
mvn jetty:run

Then leave jetty running for as long as you are developing JMC. You will need it up and running so that it can be found both when building from the command line, as well as when compiling JMC from within the Eclipse development environment.

To build the JMC application, next do the following in a separate shell (since you have jetty with the p2 site for the third-party dependencies up and running in the previous one):

cd $JMC_ROOT
mvn package

After this, you can use the build script to run the built JMC product:
./build.sh –run

For alternative ways of launching JMC, see the platform specific documentation in the README.md.

Developing JMC

Many that I’ve talked to, especially when JMC was shipped with the Oracle JDK, believed that JMC is a native application. If you’ve browsed the repo, you’ve already seen that it is a Java application, more specifically an Eclipse RCP application. Since it is an Eclipse RCP application, it’s easiest to develop JMC using Eclipse.

First set up your development environment, following the Developer Guide. It is slightly involved, but luckily does not need to happen very often.

Next, in your branch in your fork, commit the changes you want to contribute, and create a pull request, just like you would for any other open source project on GitHub.

Now, if this is your first OpenJDK PR, the OpenJDK bot will likely complain about a few different things, for example:

  • You need to have your GitHub account associated with a company that has a signed Oracle Contributor Agreement (OCA), or you must have signed an OCA yourself.
  • The PR needs to have an associated issue in the Java Bug System.
  • There is some problem with the testing or formatting of your code.

Let’s take a quick look at these three problems.

The Oracle Contributor Agreement

Like all open source projects, there needs to be a Contributor Agreement in place. This is to protect everyone backing the project, as well as the customers depending on the project. For example, the contributor agreement ensures that the source code you’re contributing isn’t violating any patent rights, and that the source code you’re contributing is yours to contribute.

Many larger companies already have an OCA signed, so the first step might be to check with your company if one is already signed. In my case, I both have a personal OCA signed (since I was contributing before Datadog signed an OCA), and one signed by my employer, Datadog.

You will know that the OCA status is not properly set up for your GitHub account when the OCA label is set in the PR, and the following text can be found in the PR:
⚠️ OCA signatory status must be verified

The OpenJDK bot will write helpful messages in the PR to help guide you through getting your OCA status verified.

The Java Bug System

Once you have a few commits under your belt, and become an OpenJDK author, you have access to the Java Bug System (JBS): https://bugs.openjdk.java.net/. So, what do you do before then? If the PR passes a first cursory check by the reviewers, a reviewer will simply create an Issue in JBS for you.

Fixing Issues

If you end up having an issue, the details of the test run in the PR will hopefully be enough to sort it out. If not, you can run mvn verify locally and look at the test logs. If it is formatting, then check if the formatting problem was in core or not, and either run mvn spotless:apply in core or in the root of the project.

Skara – the OpenJDK Git Tooling

Skara is the project name for the tooling around developing OpenJDK on Git(Hub). It actually insulates a lot of the GitHub specifics, making it possible, should the need ever arise, to move the development and development process somewhere else. The project also contains the aforementioned bot that helps, for example, to verify that there is a related JBS issue, and that there is a signed OCA. Skara also contains some useful git extensions which make working with OpenJDK on GitHub smoother.

To set things up, do the following:

Clone Skara:
git clone https://github.com/openjdk/skara

Build it:
gradlew (win) or sh gradlew (mac/linux)

Install it:
git config –global include.path “%CD%/skara.gitconfig” (win), or
git config –global include.path “$PWD/skara.gitconfig” (mac/linux)

Set where to sync your forks from:
git config –global sync.from upstream

Here are some examples:

To sync your fork with upstream and pull the changes:
git sync –pull

Note: if the sync fails with the error message “No remote provided to fetch from, please set the –from flag”, remember to set the remote for your repo, e.g.

git remote add upstream https://github.com/openjdk/jmc

To list the open PRs:
git pr list

To create a PR:
git pr create

To push your committed changes in your branch to your fork, creating the remote branch:
git publish

So, the normal workflow when working with OpenJDK JMC using the Skara tooling becomes:

Note: First ensure that you have a fork of JMC, and that your current directory is the root of that fork. You typically just create that one fork and stick with it.

  1. (Optional) Sync up your fork with upstream:
    git sync –pull
  2. Create a branch to work on, with a name you pick, typically related to the work you plan on doing:
    git checkout –b <branchname>
  3. Make your changes / fix your bug / add amazing stuff
  4. (Optional) Run jcheck locally:
    git jcheck local
  5. Push your changes to the new branch on your fork:
    git publish (which is pretty much git push –set-upstream origin <branchname>)
  6. Create the PR, either on GitHub, or from the command line:
    git pr create

Once the PR is created, the bot will check that everything is okay, and the PR will be reviewed.

Interacting with the Skara Bot

Getting the PR merged is handled a bit differently in OpenJDK compared to normal GitHub projects. First of all, all the prerequisites must first be fulfilled, like the OCA status of the contributor being verified, the change being properly reviewed, jcheck passing, the tests passing, the PR having a matching issue in JBS etc. Once that is all taken care of, the bot will helpfully ask, in a message in the PR, for the author of the PR to integrate the changes. This is simply done by typing /integrate in message in the PR. The bot will automatically rebase on the latest changes in the target branch (normally master) and squash your commits. In other words, it is perfectly fine to have multiple fixes and other commits happening in the PR after the initial commit for the PR. It is actually much preferred to force-updating the PR, as it’s easier to follow along with the review.

If the PR author is not a committer on the project, the bot will inform that the PR is ready to be sponsored by a committer, which is normally done by the reviewer of the PR. This is done by writing /sponsor in a separate message in the PR.

When the PR is merged, the corresponding JBS issue is automatically closed.

Other Related Repos

There are a few additional repos that are related to the OpenJDK JMC project, but that aren’t currently OpenJDK projects. Two examples are the jmc-jshell and the jmc-tutorial repositories. The jmc-tutorial is a good resource for learning about JDK Mission Control. Even though it is not officially an OpenJDK repository, it can still be a good place to start contributing to the OpenJDK JMC community.

Summary

  • Contributing to OpenJDK is easier than ever before now that it’s on GitHub.
  • Skara makes it even easier.
  • It’s Hacktoberfest – commits to the JMC project (and related repos) count!
  • JBS is a good source for JMC starter bugs.
  • If you need any help, the JDK Mission Control slack is a good place for asking questions! Ping me or any of the JMC folks for an invite. 🙂
  • Finally, here’s a practical guide to OpenJDK projects and the roles:
    OpenJDK Projects (java.net)

JDK Mission Control 8.1.0 Released!

Yay, the latest release of JDK Mission Control was just released! Since this is the source release, it may still take a bit of time until the downstream vendors release binary builds of JDK Mission Control 8.1.0. I will try to remember to tweet or say something on the JMC Facebook page once the binaries start showing up.

Mission Control 8.1 – New and Noteworthy


General


JMC 8.1 – New Release!
This is a new minor release of Java Mission Control. The JMC application will now require JDK 11+ to run, but can still be used with OpenJDK 8u272+ and Oracle JDK 7u40+. It can also still open and visualize flight recordings from JDK 7 and 8.

jmc


Eclipse 4.19 support
The Mission Control client is now built to run optimally on Eclipse 2021-03 and later. To install Java Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

eclipse


Minor bugfixes and improvements
There are more than 80 fixes and improvements in this release. Check out the JMC 8.1 Result Dashboard (https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20404) for more information.

issues


Core


New Serializers Core Bundle
There is now a new core bundle making it easy to serialize flight recording data to DOT (Graphviz) and JSon. This bundle will be expanded upon in future versions.

serializers


Improved JFR parser performance
The performance of the JFR parser has been improved. More improvements are coming in 8.2.

parserperf


Java Flight Recorder (JFR)


Support for the new JDK 16 Allocation Events
A new form of light weight allocation profiling was introduced with JDK 16 (see https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8257602). This version of JMC supports this new type of allocation profiling.

allocationprof


New Page for Peeking into the Constant Pools
There is a new page available for taking a look at what constants are available in the recording. This can, for example, be useful when creating custom events to see where all that storage and memory is being used.

constantpool


Open Recordings with .lz4 extension
For convenience, files with the .lz4 extension will now be attempted to be opened as flight recordings. This is since lz4 is a common compression to use with flight recordings.

lz4


JMC Agent Plug-in


New JMC Agent Plug-in
There is now a new agent plug-in available for JMC, which allows configuring where to emit flight recording events in an already running process.

agent


Bug Fixes


Area: JFR
Issue: 6939
Synopsis: Time range indicator update problem fixed

Sometimes the time range indicator wasn’t updated when setting the time range. This is now fixed.

Area: JFR
Issue: 7007
Synopsis: Unable to edit run configurations for eclipse project after installing JMC plugin fixed

Previously it would not be possible to edit run configuration after installing the experimental JMC launcher plug-in. This has now been resolve.


Known Issues


Area: General
Issue: 4270
Synopsis: Hibernation and time

After the bugfix of https://bugs.openjdk.java.net/browse/JDK-6523160 in JDK 8, the RuntimeMXBean#getUptime() attribute was re-implemented to mean “Elapsed time of JVM process”, whilst it previously was implemented as time since start of the JVM process. The uptime attribute is used by JMC, together with RuntimeMXBean#getStartTime(), to estimate the actual server time. This means that time stamps, as well as remaining time for a flight recording, can be wrong for processes on machines that have been hibernated.

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7068
Synopsis: JfrRecordingTest (uitest) hangs on the automated analysis page

Trying to run uitests on Fedora hangs on JfrRecordingTest.

Area: JFR
Issue: 7003
Synopsis: The graph and flame graph view does not work on Windows

This is due to a bug with the Edge based browser component in SWT. We’ll look into it for 8.2.0.

Area: JFR
Issue: 6265
Synopsis: JMC crashes with Webkit2+GTK 4

See the issue for more information.

Area: JFR
Issue: 5412
Synopsis: Dragging and dropping a JFR file into an open analysis page does not work

The expected behaviour would be to open the recording whenever a file is dropped in the editor area, but the behaviour will be defined by the embedded browser component, and not very useful.

JMC Core Now on Maven Central!

Good news! The JDK Mission Control core library bundles are now available on Maven Central, making it easier than ever to use things like the the JMC JDK Flight Recorder parser to transparently parse and extract information from flight recordings ranging from Oracle JDK 7 and up to the very latest versions of OpenJDK.

I have updated my JShell example so that you can see an example of them being used.

For just using the parser, you will typically need the common and flightrecorder bundles. If you also need the rules engine, you add flightrecorder.rules, and if you want the base set of heuristics for the jdk, also add flightrecorder.rules.jdk.

For example:

	<properties>
		<jmc.version>8.0.1</jmc.version>
	</properties>
	<dependencies>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>common</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder.rules</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder.rules.jdk</artifactid>
			<version>${jmc.version}</version>
		</dependency>
	</dependencies>

Finally! 🙂

OpenJDK and the Future of Production Profiling

Some thoughts on the future of continuous production profiling on the OpenJDK platform.

A long time ago, the JRockit Runtime Analyzer (JRA) was introduced into the JRockit JVM as a means of being able to figure out what was going on in the JVM. It was mainly there to find out how customers were using the JVM, so that the JVM could be optimized for actual real-world production work-loads. The JRA output the data as XML, since customers insisted on the data being human readable so that they could see exactly what they would be sending us. Later LAT (LATency Analyzer) was introduced, since after the introduction of the JRockit low latency garbage collector (a.k.a the DetGC), some customers complained about the GC not keeping its latency promises. More often than not, it turned out it was other kinds of thread stalls causing the latencies, so LAT was introduced so that the JRockit team could figure out where the problematic code was. Since there could be a considerable amount of data in LAT, a binary format was introduced for the events recorded.

Eventually the production profiling (JRA + LAT) had a model overhaul and became the JRockit Flight Recorder. The data format was binary, self describing, extensible and efficient. After Oracle acquired Sun Microsystems, and the Hotspot and JRockit JVM teams merged, it became the Java Flight Recorder (JFR), and in 2018 it was open sourced as the JDK Flight Recorder (still JFR), since calling anything related to Java something with “Java” in the name can be complicated.

The JDK Flight Recorder design philosophy is to be the one-stop-shop production profiler for OpenJDK. JFR needs to be able to do various kinds of profiling, all at the same time, at a low overhead. It also needs to be able to run continuously for as long as someone is interested in the data. Potentially always.

Now, with changes in the Java (and the computing) ecosystem, JFR has some loom-ing challenges to remain relevant for the future.

JFR has a lot of nice properties:

  • It does multiple kinds of profiling, normally at a low overhead (space and CPU).
    (Profiling types include allocation profiling, almost CPU profiling, latency outlier profiling and much more.)
  • It doesn’t suffer from the same constraints as Java-only based profilers.
    (Weaknesses usually include things like safe point bias, safepointing VM operations, allocation pressure, undoing scalarization optimizations, lack of STW/safepoint visibility.)
  • It provides context helpful to solving problems.
    (For example, a monitor enter event has the monitor class, the monitor address, the thread holding the monitor, the thread and stack trace for the blocking call and more.)
  • It is low overhead, (mostly) designed for predictable data rates and overhead.
  • It is extensible – you can add your own profilers and data.
    (Datadog, for example, has its own rather useful exception profiler which is publishing data into JFR.)
  • As long as all your data is produced in JFR, all the common constants in the runtime will be recorded into the same constant pool.
    (For example, if you record some data using JFR and some data with a custom agent, common class names, method names etc will need to be repeated in both of the serialization formats.)

There are however a few ways in which JFR can, and probably has to, improve, to remain the ultimate production profiling platform for Java:

  • Data rate (and thereby overhead) needs to be more deterministic.
  • CPU profiling needs to be improved.
  • Wall clock profiling needs to be possible.
  • It needs to be possible to easily and cheaply associate context with events.

Let’s take these in reverse order.

Recording Context

A very common problem is to try to associate some contextual information with the data produced by JFR. For example, in the good old Oracle days, there were events produced by WebLogic Server (WLS) containing what was called the Execution Context IDs (ECIDs). Since JFR events are thread local, it was possible to use them to find out what other, lower level, information was captured during that time, for example where the allocation pressure was, or why a call to a logger leading to a blocking call trying to enter a contended monitor happened. The instrumented code, and thereby the events, were usually emitted where something took a bit of time, such as database IO and similar, so the overhead was not bad.

Fast-forward to today. We now have distributed tracers doing pretty much what WLS was doing back in the day. We now have a trace, which is a directed acyclic graph of spans, which we track. The difference being that with the microservices and async frameworks that are commonly in use these days, these spans can now last a very short time. The tracers do keep track of when the context is propagated to other threads; in other words, the tracer can know where work is carried out on specific threads for a specific trace/span. In the end, the tracer doesn’t really care though – it only keeps track of this to know when a span can be closed. No matter how many related little work items were scheduled on various threads, the tracer will simply note when the span was started, and when the span was closed. For the profiler, however, the information about which thread did work related to a specific span is crucial to be able to figure out what traces were captured in the context of what spans/traces. Since a thread can switch between doing work for different spans at an incredible rate, the overhead of using additional events to keep track of them can be quite costly. Especially if the switches happen way more often than a sample is being captured.

This will only be compounded once Loom becomes part of the Java platform, and we can have millions of Virtual Threads (fibers, p-threads, MxN threads, green threads, whatever you want to call them) running.

A long time ago, the JFR team was discussing something we called thread coloring – the ability to tell a thread what the current context is. This is exactly what Go does today with goroutines. In Go it is called profiling labels. Go pre-aggregates the CPU profiling data, but allows the settings of labels to group the profiling data into buckets. Sadly, in Go it is currently limited to CPU-profiling only.

One solution to the context problem could be to allow setting the context for a thread – conceptually a Map<String, long>, and to provide some new settings for an event, for example record-context, to decide if the context should be recorded into the event or if we don’t care, and record-only-on-context, to decide to record an event only if there is some context present.

Having such a capability would not only make it possible to associate events with a context even where such a context changes rapidly. It could also be used to only record certain events when they occur in a context for which we care, possibly bringing down the overhead of capturing data and stack-traces, and saving memory in the buffers for the things we find to be the most relevant. A concrete example would be to only record certain information if a thread is currently associated with a trace which will be sampled.

See https://bugs.openjdk.java.net/browse/JDK-8264516.

Wall-Clock Profiling Must Be Possible

JFR has an interesting mix of sampling profiling, thresholded execution tracing and metrics. For example, it has a sampling execution profiler, a rate-limited sampling allocation profiler etc. It also has events to locate thread latency outliers. These events are not sampled, but give you exact information about a thread halt lasting longer than a configurable threshold.

Of course, one problem with thresholding is that you can theoretically have a lot of thread latencies just below the threshold. It is easy to construct an artificial benchmark where the data indicates that you have no thread latencies whilst having nothing but thread latencies.

One way to get unbiased information, at a reasonable cost (i.e. not setting the threshold to 0 for the latency outlier events), is to introduce a wall clock profiler. A wall clock profiler basically periodically dumps the thread stack for a thread, no matter what it is up to. A limitation is that the wall clock profiler has no idea of what the thread was actually up to – we can’t get all the juicy, custom, actionable, information at that point – we usually only get the thread dump.

Also, given that you might have millions of (virtual) threads, you probably don’t want to sample them all at any given time, and you certainly don’t want to stop the world whilst doing the sampling. As a matter of fact, for most applications, looking at wall clock profiling information from all threads will be a very boring exercise – most of them will be halted waiting for something to do, for example in thread pools, or waiting in some I/O. For example, for your average Java recording, you will have plenty of threads in thread pools, being parked. To see that you have thousands of threads waiting on a park call in a thread pool will not help you resolve a great many problems. If you, on the other hand, have a really long park causing a trace to take a really long time to complete, then you’ll likely be interested in what is going on.

This means that different profilers probably want these wall-clock samples to be picked differently. One way to accomplish this would be to allow users of JFR to commit events on a separate thread.

Event#commit(Thread)

This would allow someone to build a profiler by simply periodically committing events on the threads it currently cares about. It would also allow building other kinds of profilers and sampling behaviours.

Another way of accomplishing this would be to add an annotation to override default stuff captured by JFR in an event.

class MyEvent extends Event {

    @overridejfr
    Thread overrideThread;
}

An advantage with this variant is that you could potentially add more override behaviours that are handled by JFR over time.

A third variant might be to introduce a configurable global wall clock event, perhaps one that picks one thread at a time in a round robin fashion, at a certain frequency. Together with the context feature mentioned above, it would be possible to only emit events where there is useful context available.

Yet another variant may be to add a native function (extern “C”) to emit an event. Something similar to AsyncGetCallTrace (probably implemented using AsyncGetCallTrace), but which emits an event. If we have the context feature described above, it could be used to decide where events are emitted. If different kinds of profiling will be implemented, you can simply provide context information about the kind of profiling being done to avoid the profilers interfering with each other.

There is JDK-8237206 for this, but there isn’t much information recorded in the issue.

CPU Profiling Improved

The current way of doing CPU profiling with JFR is to use the Execution Sample Event in JFR. It gives you a good idea of where the JVM is spending the most CPU time executing Java code. It is cheap, has low overhead, and satisfies the needs of many users.

That said, there are some problems. One is that it does not cover the full Java process. If there is a native library running native threads, the CPU time spent there won’t be accounted for. Also, some other native and intrinsified code, for example the JVM native threads, will not be covered. It is easy to write a benchmark where JFR will capture almost no samples at all.

To get a better understanding of where the CPU is being spent, we need to build a profiler actually sampling on CPU time, and that can, at least optionally, capture the full thread stacks including native frames. One approach to this is to do what async profiler does – use kernel profiling data and line it up with information from AsyncGetCallTrace.

At Datadog we’ve built a pretty awesome CPU profiler for Linux, based around perf_event_open – one that does really well, and which handles symbol resolution in containerized environments better than most. Sadly, we don’t have it for Windows and Mac, so, if we contribute it to OpenJDK, we would need help implementing it on other platforms.

There is JDK-8234854, but I haven’t added much context yet.

Deterministic Overhead and Memory Use

One problem that JFR used to have, before JDK 16, was that allocation profiling could get very expensive. Especially for allocation heavy applications, running on plenty of CPUs. It could get very expensive both in terms of overhead but also in terms of size. For example, running a well parallelized, allocation heavy, application on a 96 core machine could produce millions of events per minute, resulting in recording sizes of hundreds of megabytes per minute.

In JDK 16 Datadog submitted a new kind of allocation profiler, which was rate limited. It drew inspiration from PID controllers (yes, I still love robotics, even though there is literally no time for it these days) to let the user specify a budget of the maximum number of events per second, and then the controller will attempt to keep the number of events emitted to that budget, subsampling if necessary.

We had already used the same technique successfully for our exception profiler, since it turns out that some people use not only exceptions, but even subclasses of Error (!), for flow control.

Now, it may be useful to use the same kind, or some similar kind, of safety valve to limit both the overhead and the data production rate for other events that are prone to the same edge cases. For example, the thread latency events today are currently thresholded. This means that you perhaps, given the number of threads, might be able to estimate the maximum number of events you could produce. But it’s not very helpful, since you might be running very few, or, in the future, with Loom, millions of them.

Instead you might want to subsample them, at a max rate that you can set yourself, per event type. You could even allow for thresholding and rate limited sampling at the same time, accepting a certain bias, e.g. if you want to focus on longer lasting thread halts.

Summary

TL;DR, JFR has some interesting challenges ahead. Some of them will be compounded by the introduction of Virtual Threads. For JFR to remain the premium, best in class, production profiling platform, some investment will be needed. Datadog will try to help, but some of these problems will require updates to the JFR file format and updating the API. The OpenJDK community will need to be involved.

JMC 8 Binary Releases!

Two binary releases of JMC 8 are now available:

Since I could not find the project provided release notes at any of the vendor sites, I’ve taken the liberty of reproducing them below.

Mission Control 8.0 – New and Noteworthy


General


JMC 8.0 – Major New Release!
This is a major new release of Java Mission Control. All binary build of the JMC application using the Eclipse 2020-09 platform will now require JDK 11+ to run, but can be used with OpenJDK 8u272+ and Oracle JDK 7u40+. It can also open and visualize flight recordings from JDK 7 and 8.

jmc[1]


Eclipse 4.16 support
The Mission Control client is now built to run optimally on Eclipse 2020-06 and later. To install Java Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

eclipse[1]


JOverflow in the Base Distribution
JOverflow is now part of the base distribution of Mission Control, and has been converted to SWT. It also has a brand new TreeMap visualization of the heap occupied by type. To open the new view, go to Window | Show view | Other… and start typing JOverflow, and select JOverflow TreeMap.

joverflow


Minor bugfixes and improvements
There are over two hundred fixes and improvements in this release. Check out the JMC 8 Result Dashboard (https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=19536″) for more information.

issues


Java Flight Recorder (JFR)


Flame Graph View Improvements
The Flame Graph View is now available in the standard perspective. It can now be configured to be rendered either as an icicle graph or a flame graph, and the tool tips have been improved. Click on anything representing a selection of events to get the aggregate of stack traces represented by that selection rendered in the Flame Graph View.

flameview


New Graph View
An early access version of a Graph View is now available. The Graph View will show a directed graph where the individual methods are nodes in the graphs, and the edges will represent calls from one method to another. The size of the node will represent the “self” contribution, and the thickness of an edge will correlate with the number of calls from one method to another in the events. To open up the graph view, go to Window | Show view | Other…, then select Mission Control / Graph View. Once the Graph View has opened, it can be docked where you want it to be, or dragged out into a new top level window which can be handled by your window manager. Currently there is no graph pruning, and large graph will take a very long time to layout and render, during which the UI will freeze. Therefore we’ve imposed a user configurable limit on the number of nodes a graph can have for now.

graphview


Predecessors and Successors
The predecessors and successors tabs have been brought back to the Method Profiling page. This means that it is now easy to select a method and easily see wherever that method was called from, and to see what was then called from that method. This is a temporary solution until we have a separate Butterfly/Sandwich view where any selection of a method in a set of events can be rendered in a separately configured view.

predsuc


JMC Agent


Agent 1.0.0
This is the first version where we build and release the JMC agent. The JMC agent is a byte code instrumentation agent, providing an easy way to declaratively generate JFR events from most methods, even methods for which the source code is not available. Either configure the agent with an XML configuration file, or use the MBean to dynamically change the instrumentation whilst the application is running.

agent


Converter Functions
Now any public static method can be used to convert objects of a specific type to one of the types supported by flight recorder. The converter to use must be explicitly declared per recorded data.

converter


Bug Fixes


Area: JFR
Issue: 5734
Synopsis: Overflow in stacktrace tooltips fixed

On Mac, the tooltips shown for the stacktraces could overflow. This is now fixed.


Known Issues


Area: General
Issue: 4270
Synopsis: Hibernation and time

After the bugfix of https://bugs.openjdk.java.net/browse/JDK-6523160 in JDK 8, the RuntimeMXBean#getUptime() attribute was re-implemented to mean “Elapsed time of JVM process”, whilst it previously was implemented as time since start of the JVM process. The uptime attribute is used by JMC, together with RuntimeMXBean#getStartTime(), to estimate the actual server time. This means that time stamps, as well as remaining time for a flight recording, can be wrong for processes on machines that have been hibernated.

Area: JFR
Issue: 7122
Synopsis: Rules evaluation never complete

Sometimes the rules evaluation may never complete for the GC rules.

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7068
Synopsis: JfrRecordingTest (uitest) hangs on the automated analysis page

Trying to run uitests on Fedora hangs on JfrRecordingTest.

Area: JFR
Issue: 7007
Synopsis: Unable to edit run configurations for eclipse project after installing JMC plugin

Avoid installing the experimental JMC launcher plug-in into Eclipse until this problem has been resolved.

Area: JFR
Issue: 7003
Synopsis: The graph view does not work on Windows

This is due to a bug in the chromium browser plug-in used by JMC on Windows. See the issue for more information.

Area: JFR
Issue: 6265
Synopsis: JMC crashes with Webkit2+GTK 4

See the issue for more information.

Area: JFR
Issue: 6265
Synopsis: JMC crashes with Webkit2+GTK 4

See the issue for more information.

Area: JFR
Issue: 5412
Synopsis: Dragging and dropping a JFR file into an open analysis page does not work

The expected behaviour would be to open the recording whenever a file is dropped in the editor area, but the behaviour will be defined by the embedded browser component, and not very useful.

Continuous Production Profiling and Diagnostics

I’ve gotten a lot of questions about continuous production profiling lately. Why would anyone want to profile in production, or, if production profiling seems reasonable, why the heck leave it on continuously? I thought I’d take a few moments and share my take on the problem and the success I’ve seen the past years applying continuous production profiling in systems in the real world.

Trigger warning: this blog will not contain code samples. 😉

Profiling?

So what is software profiling then? It’s the ancient black magic art of trying to figure out how something is performing, for some aspect of performing. In American TV-series, the profiler is usually some federal agent who is adept at understanding the psychology of the criminal mind. The profiler attempts to understand key aspects of the criminal to make it easier for the law enforcement agents to catch him. In software profiling we’re kind of doing the same thing, but for software – your code as well as all the third party code you might be depending on.

We’re trying to build an accurate profile of what is going on in the software when it is being run, but in this case to find ways to improve a program. And to understand what is going on in your program, the profiler has to collect call traces and usually some additional context to make sense of it all.

In comparison to other observability tools, like metrics and logs, profilers will provide you with a holistic view of a running program, no matter the origin of the code and requiring no application specific instrumentation. Profilers will provide you with detailed information about where in the actual code, down to the line and byte code index, things are going down. A concrete example would be learning which line in a function/method is using most of the CPU, and how it was being called.

It used to take painting a red pentagram on the floor, and a healthy stock of black wax candles, to do profiling right. Especially in production. Overhead of early profilers weren’t really a design criteria; it was assumed you’d run the process locally, and in development. And, since it was assumed you’d be running the profiling frontend on the same machine, profiling remote processes were somewhat tricky and not necessarily secure. Production profilers, like JFR/JMC came along, but they usually focus on a single process, and since security is a bit tricky to set up properly, most people sidestep the problem altogether and run (yep, in production) with authentication and encryption off.

Different Kinds of Profiling

Profiling means different things to different people. There are various types of resources that you may be interested in knowing more about, such as CPU or locks, and there are different ways of profiling them.

Most people will implicitly assume that when talking about profiling, one means CPU-profiling – the ancient art of collecting data about where in the code the most CPU-time is spent. It’s a great place to start when you’re trying to figure out how to make your application consume less CPU. If you can optimize your application to do the same work with less resources, this of course directly translates into lowering the bill to your cloud provider, or being able to put off buying those extra servers for a while.

Any self-respecting modern profiling tool will be able to show more than just the CPU aspect of your application, for example allocation profiling or profiling thread halts. Profiling no longer implies just grabbing stack-traces, and assigning meaning to the stack trace depending on how it was sampled; some profilers collaborate closely with the runtime to provide more information than that. Some profilers even provide execution tracing capabilities.

Execution tracing is the capability to produce very specific events when something interesting happens. Execution tracing is available on different levels. Operating systems usually provide frameworks allowing you to listen on various operating system events, some even allowing you to write probe definitions to decide what data to get. Examples include ETW, DTrace and eBPF. Some runtimes, like the OpenJDK Java VM, provide support for integrating with these event systems, and/or have their own event system altogether. Java, being portable across operating systems, and wanting to provide context from the runtime itself, has a high performance event recorder built in, called the JDK Flight Recorder. Benefits include cheap access to information and emission of data and state already tracked by the runtime, not to mention an extensible and coherent data model.

Here are a few of my favourite kinds of profiling information:

  • CPU profiling
  • Wall-clock profiling
  • Allocation profiling
  • Lock / Thread halt / Stop-the-World profiling
  • Heap profiling

Let’s go through a few of them…

CPU Profiling

CPU profiling attempts to answer the question about which methods/functions are eating up all that CPU. If you can properly answer that question, and if you can do something about it (like optimizing the function or calling it less often) you will use less resources. If you want to reduce your cloud provider bill, this is a great place to start. Also, if you can scope the analysis down to a context that you care about, let’s say part of a distributed trace, you can target improving the performance of an individual API endpoint.

Wall-Clock Profiling

Wall-clock profiling attempts to answer the question about which method/function is taking all that time, no matter if on CPU or not. For runtimes supporting massively multithreaded applications, this information is much less useful without some context.

For example, let’s say you have a Java application with various thread pools running various kinds of operations. You may have hundreds of threads, all of them mostly parked, awaiting some work to do. Unless you have some context, all the wall-clock profiling will tell you is that most threads were parked. But if you do have some context, let’s say context around which span in a distributed trace is running when samples are taken, your wall-clock profiling data can tell you in which methods most of the time was spent during a particularly long lasting span. [1]

As a general rule of thumb, wall-clock profiling is useful for finding and optimizing away latencies, whereas CPU profiling is more suited for optimizing throughput. Also, execution tracing is a great complement to wall-clock profiling.

If you can tell where the wall-clock time is spent, you can help remove performance obstacles by seeing which method calls take time and optimize them, or reduce the number of calls to them.

Allocation Profiling

Allocation profiling is trying to answer where all that allocation pressure is coming from, and from allocating what. This is important, since all that allocated memory will usually have to be reclaimed at some point in time, and that uses both CPU and possibly causes stop-the-world pauses from GC (though modern GC technologies, for example ZGC for the Java platform, is making this less of an issue for some types of services).

If you can properly answer where the allocation pressure comes from, you can bring down GC activity by optimizing the offending methods, or have your application call them less.

Lock / Thread Halt / Stop-the-World (STW) Profiling

This kind of profiling tries to answer the question about why my thread didn’t get to run right there and right then. This is typically what you would use the wall-clock-profiler for, but the wall-clock-profiler usually has some serious limitations, making it necessary to collaborate with the runtime to get some additional context. The wall-clock profiler typically only gets sampled stack traces showing you which method you spent time in, but without context it may be hard to know why.

Here are some examples:

  • Your thread is waiting on a monitor
    Context should probably include which thread is currently holding the monitor, which address the monitor has, the time you had to wait etc.
  • Your runtime is doing something runtimey requiring stopping the world, showing your method taking its own sweet time, but not offering any clues as to why
    • STW phase due to GC happening in the middle of running your method.
    • STW phase due to a heap dump
    • STW phase due to full thread stack dump
    • STW phase due to bad behaving framework, or your well meaning colleague(s), forcing full GCs all the time, since they “know that a GC really improves performance if done right there”, not quite realizing that it’s just a small part of a much bigger system.
  • Your thread is waiting for an I/O operation to complete
    Context should probably include the IP address (socket I/O) or file (file I/O), the bytes read/written etc.

There are plenty of more examples, wait, sleep, park etc. To learn more, open JDK Mission Control and take a look at individual event types in the event browser.

Heap Profiling

This kind of profiling attempts to answer questions about what’s on your heap and, sometimes, why. This information can be used to reduce the amount of heap required to run your application, or help you solve memory leaks. Information may range from heap histograms showing you the number of instances of each type on the heap, to leak candidates, their allocation times and allocation stack traces, together with the reference chains still holding on to them.

 

Continuous Production Profiling

Assuming that your application always has the same performance profile, which implies always having exactly the same load and never being updated, with no edge cases or failure modes, and assuming perfectly random sampling, your profiler could simply take a few samples (let’s say 100 to get a nice distribution) over whichever time period you are interested in (let’s say 24 hours), and call it a day. You would have a very cheap breakdown over whatever profiling information you’re tracking.

These days, however, new versions of an application are deployed several times a day, evolving to meet new requirements at a break-neck speed. They are also subjected to rapidly changing load profiles. Sometimes there may be an edge case we didn’t foresee when writing the program. Being able to use profiling data to not only do high level performance profiling, but detailed problem resolution, is becoming more and more common, not to mention useful.

At Datadog, we’ve used continuous production profiling for our own services for many months now. The net result is that we’ve managed to lower the cost of running our services all over the company by quite large amounts of money. We’ve even used the profiler to improve our other components, like the tracer. I had the same experience at Oracle, where dedicated continuous profiling analysis was used to a great extent for problem resolution in production systems.

Aside from being incredibly convenient, there are many different reasons why you might want to have the profiler running continuously.

 

Change Analysis

These days new versions are deployed several times a day. This is certainly true for my team at Datadog. There is great value in being able to compare the performance profile, down to the line of code. This is true across new releases, specific time intervals, over other attributes like high vs low CPU load, and countless other facets.

 

Fine Grained Profiling

Some production profiling environments allow you to add context, for example custom events, providing the means to look at the profiling data in the light of something else happening in a thread at a certain time. This can be used for doing breakdowns of the profiling data for any context you put there, any time, anywhere.

Adding some contextual information can be quite powerful. For example, if we were able to extend the profiling data with information about what was actually going on in that thread, at that time, any other profiling data captured could be seen in the light of that context. For example, WebLogic Server produced Flight Recorder Events for things like SQL calls, servlet invocations etc, making it much easier to attribute the low-level information provided by the profiler to higher level constructs. These events were also associated with an Execution Context ID which spanned processes, making it possible to follow along in distributed transactions.

With the advent of distributed tracing, this can be done in a fairly general way, so that profiling data can be associated with thread local activations of spans in a distributed trace (so called scopes). [1]

That said, with a general recording framework, there is no limit to the kinds of contexts you can invent and associate your profiling data with.

Diagnostics

It’s 2:03 a.m., all of a sudden some spans in your distributed trace end up taking a really long time. Looking at the spans, there is nothing indicating something is actually going wrong, or that the data is bad. From what is present in the tag data, nothing seems to be related between the spans. You decide to open up the profile.
The automated analysis informs you that a third-party library has initiated safe pointing VM operations from a certain thread, in this case for doing full heap dumps. The analysis text points you to more documentation about what a safe point is. You read up on safe pointing VM operations, and the library, and find out that under certain conditions, the library can initiate an emergency heap dump, but that the feature can be turned off. You turn it off, redeploy and go back to sleep.

Or, perhaps the automated analysis informs you that there is heavy lock contention on the apache logger, and links you to the lock profiling information. Looking at the lock profiling information, it seems most of the contention is being caused by the logging done on one particular line. You decide that the logging there is not essential, remove it, commit, redeploy and go back to sleep.
When something happens in production, you will always have data at hand with a continuous profiler. There is no need to try to reproduce the exact environment and conditions under which the problem occurs. You will always have actionable data readily available.

Of course, the cure must not be worse than the ailment. If the performance overhead you pay for the information costs you too much, it will not be worth it. Therefore this rather detailed information must be collected quite inexpensively for a continuous production profiler.

 

Low-overhead Production Profiling

So, how can one go about producing this information at a reasonable cost? Also, we can’t introduce too much observer effect, as this will skew the data, and not truly represent the application behaviour without the instrumentation.

There are plenty of different methods and techniques we can use. Let’s dig into a few.

Using Already Available Information

If the runtime is already collecting the data, exporting it can usually be done quite cheaply. For example, if the runtime is already collecting information about the various garbage collection phases, perhaps to drive decisions like when to start initiating the next concurrent GC-cycle, that information is already readily available. There is usually quite a bit of information that an adaptively optimizing runtime keeps track of, and some of that information can be quite useful for application developers.

Sampling

One technique we can use is to not take every single possible value, but do statistical sampling instead. In many cases this is the only way which makes sense. Let’s take CPU profiling for example. In most cases, we will be able to select an upper boundary for how much data we produce by either selecting the CPU quanta between samples, or by selecting a fixed number of threads to look at any given time and the sampling period. There are also more advanced techniques for getting a fixed data rate.

An interesting example from Java is the new upcoming allocation profiling event. Allocation in Java is most of the time approximately the cost of bumping a pointer. The allocation takes place in thread local area buffers (TLABs). There is no way to do anything in that code path without introducing unacceptable overhead. There are however two “slow” paths in the allocator. One for when the TLAB is full. The other one for when the object is too large to fit in a TLAB (usually by allocating an enormous array) leading to the object being allocated directly on heap. By sampling our allocations at these points, we get relatively cheap allocation events that are proportional to the allocation pressure. If we were able to configure how often to subsample over the average amount of memory allocated between samples, we would be able to regulate the acceptable overhead. That said, what we’re really looking for is a constant data production rate, so regulating that is better left to a PID-style controller, giving us a relatively constant data production.

Of course, the less sample points we have, the less we can say about the behaviour over very short periods of time.

Thresholding

One sort of sampling is to simply only collect outliers. For some situations, we really would like to get more information. One example might be thread halts that take longer than, say, 10ms. Setting a threshold allows us to do a little bit of more work, when it’s very much warranted. For example, I might only be interested in tracking blocking I/O reads/writes lasting longer than a certain threshold, but for them I’d like to know the amount of bytes read/written, the IP address read from/written to etc.

Of course, the higher the threshold, the more data we will miss (unless we have other means to account for that time). Also, thresholds make it harder to reason about the actual data production rate.

Protect Against Edge Cases

Edge cases which make it hard to reason about their potential overhead should be avoided, or at least handled. For example, when calculating reference chains, you may provide a time budget for which you can scan, and then only do it when absolutely needed. Or, since the cost of walking a stack trace can be proportional to the number of frames on the stack, you can set an upper limit to how many frames to walk, so that recursion gone wild won’t kill your performance. Be careful to identify these edge cases, and protect against them.

One recent example is the Exception event available in the Flight Recorder (Java), which can be configured to only capture Errors. The Java Language Specification defines an Error like this:

“Error is the superclass of all the exceptions from which ordinary programs are not ordinarily expected to recover.”

You would be excused for believing that Errors would happen very rarely, and that recording all of them would not be a problem. Well, a very popular Java framework, which will remain unnamed, subclassed Error in an exception class named LookAheadSuccess. That error was used in a parser and used for control flow, resulting in the error being thrown about a gazillion times per minute. We ended up developing our own solution for exception profiling at DD, which records Datadog specific events into the JDK Flight Recorder.

Some Assembly Required

These techniques, and more, can be used together to provide a best-of-all-worlds profiling environment. Just be careful, as with most things in life a balance must be found. Just like there is (trigger warning) no single energy source that will solve our energy problems in a carbon neutral way (we should use all at our disposal – including nuclear power – to have a chance to go carbon neutral in a reasonable time [2][3]), a balance must be struck between sampling and execution tracing, and a balance for how much data to capture for the various types of profiling you’re doing.

 

Continuous Profiling in Large Deployments
Or, Finding What You’re Looking For

In a way this part of the blog will be a shameless plug for the work I’ve been involved with at Datadog, but it may offer insights into what matters for a continuous profiler to be successful. Feel free to skip if you dislike me talking about a specific commercial solution.

So, you’ve managed to get all that juicy profiling down to a reasonable amount of data (for Datadog / Java, on average about 100k events per minute, with context and stacktraces, or 2MB per minute, at less than 2% CPU overhead), that you can process and store without going broke. What do you do next?

That amount of data will be overwhelming to most people, so you’ll need to offer a few different ways into the data. Here are a few that we’ve found useful at Datadog:

  • Monitoring
  • Aggregation
  • Searching
  • Association by Context
  • Analysis

Monitoring

All that detailed data that has been collected can, of course, be used to derive metrics. We differentiate between two kinds in the profiling team at Datadog:

  • Key Performance Metrics
  • High Cardinality Metrics

Key performance metrics are simple scalar metrics, you typically derive a value, periodically, per runtime. For example CPU utilization or allocation rate.

Here’s an example showing a typical key performance metric (note that all pictures are clickable for a better look):

kpm

The graph above shows the allocation rate. It’s a simple number per runtime that can change over time. In this case the chart is an aggregate over the service, but it could just as well be a simple metric plotted for an individual runtime.

High Cardinality Metrics are metrics that can have an enormous amount of different buckets with which the values are associated with. An example would be the cpu time per method.

We use these kinds of metrics to support many different use cases, such as allowing you to see the hottest methods in your entire datacenter. The picture below shows the hottest allocation sites across a bunch of processes.

hcm2

Here are some contended methods. Yep, one is a demo…

locks

Metrics also allow you to monitor for certain conditions, like having alerts / watchdogs when certain conditions or changes in conditions occur. That said, they aren’t worth that much unless you can, if you find something funny, go see what was going on – for example see how that contended method was reached when under contention.

 

Aggregation

Another use case is when you simply don’t care about a specific use case. You just want to look at the big picture in your datacenter. You may perhaps want to see, on average, across all your hosts and for a certain time range, what the CPU profiling information looks like? This would be a great place to start if, for example, looking for ways to lower the CPU usage for Friday nights, 7 to 10 p.m.

Here, for example, is an aggregation flame graph for the profiling data collected for a certain service (prof-analyzer), where there is some load (I set it to a range to filter out the profiles with very little load).

aggregation

A specific method can be selected to show how that specific method ended up being called:
methodselect

Searching

What if you just want to get to an example of the worst possible examples of using a butt-load of CPU? Or if you want to find the worst example of a spike in allocation rate? Having indexed key performance metrics for the profiling data makes it possible to quickly search for profiling information matching certain criteria.

Here is an example of using the monitor enter wait time to filter out an atypically high lock contention:

atypicallock

 

Association by Context

Of course, if we can associate the profiling data with individual traces, it would be possible to see what went on for an individual long lasting span. If using information from the runtime, even things that are normally hidden from user applications (including profilers purely written in Java), like stop-the-world pauses, would be visible.

breakdown

 

Analysis

When having access to all that yummy, per thread and time, detailed, profiling data, it would be a shame to not go looking for some interesting patterns to highlight. The result of that analysis can provide a means to focus on the most important parts of the profiling data.

analysis1

So, nothing terribly interesting going on in our services right now. The one below is from a silly demo app.
analysis2

That said, if you’re interested in the kind of patterns we can detect, check out the JDK Mission Control rules. The ones at Datadog are a superset, and work similarly.

 

Summary

Profiling these days is no longer limited to high overhead development profilers. The capabilities of the production time profilers are steadily increasing and their value is becoming less controversial, some preferring them for complex applications even during development. Today, having a continuous production profiler enabled in production will offer unparalleled performance insights into your production environment, at an impressively low performance overhead. Data will always be at your fingertips when you need it.

Additional Reading

https://www.datadoghq.com/blog/datadog-continuous-profiler/
https://www.datadoghq.com/blog/engineering/how-we-wrote-a-python-profiler/

Many thanks to Alex Ciminian, Matt Perpick and Dan Benamy for feedback on this blog.


[1]: Deep Distributed Tracing blog: http://hirt.se/blog/?p=1081

Unrelated links regarding the very interesting and important de-carbonization debate:

[2]: https://theness.com/neurologicablog/index.php/there-is-no-one-energy-solution/

[3]: https://mediasite.engr.wisc.edu/Mediasite/Play/f77cfe80cdea45079cee72ac7e04469f1d

 

 

 

A Closer Look at JFR Streaming

By Marcus Hirt and JP Bempel

Since JDK 14 there is a new JFR kid on the block – JFR streaming. 🙂 This blog post will discuss some of the things that you can do with JFR streaming, as well as some of the things you may want to avoid.

An Introduction to JFR Streaming

In the most recent version of the JDK a new JFR-related feature was introduced – JFR streaming. It is a feature allowing a developer to subscribe to select JFR data and to decide what to do with that data in the host process. JFR events can also be consumed from a separate process by pointing to the file repo of a separate JVM process – the mechanism is the same.

The JFR streaming works by allowing the reading from the JFR file whilst it is being written, the emissions to disk happening more frequently (usually every seconds, or when the in memory buffers are full) than during your normal flight recordings, where the data is emitted when the in-memory buffers are full. It does not support streaming directly from in-memory buffers, and the events are not delivered synchronously when they are occurring in the JVM.

The new functionality mostly resides in jdk.jfr.consumer. This is how you would open an event stream and start consuming the CPU load with 1 second intervals and the monitor class when blocked to enter a monitor for 10 ms:

try (var rs = new RecordingStream()) {
  rs.enable("jdk.CPULoad").withPeriod(Duration.ofSeconds(1));
  rs.enable("jdk.JavaMonitorEnter").withThreshold(Duration.ofMillis(10));
  rs.onEvent("jdk.CPULoad", event -> {
    System.out.println(event.getFloat("machineTotal"));
  });
  rs.onEvent("jdk.JavaMonitorEnter", event -> {
    System.out.println(event.getClass("monitorClass"));
  });
  rs.start();
}

The RecordingStream is what you would use to control what is gathered from within the Java process, effectively also controlling the recorder.

Here is another example using the default recording template, and printing out the information for garbage collection events, cpu load and the JVM information:

Configuration c = Configuration.getConfiguration("default");
 try (var rs = new RecordingStream(c)) {
     rs.onEvent("jdk.GarbageCollection", System.out::println);
     rs.onEvent("jdk.CPULoad", System.out::println);
     rs.onEvent("jdk.JVMInformation", System.out::println);
     rs.start();
   }
 }

The EventStream class can be used together with the standard flight recorder mechanisms to gather information from ongoing recordings, even ones being done in separate processes or an already recorded file. Here is an example using the EventStream to get some other attributes of the CPU load and information from garbage collections from within the Java process (needs an ongoing recording):

try (var es = EventStream.openRepository()) {
   es.onEvent("jdk.CPULoad", event -> {
     System.out.println("CPU Load " + event.getEndTime());
     System.out.println(" Machine total: " + 100 * event.getFloat("machineTotal") + "%");
     System.out.println(" JVM User: " + 100 * event.getFloat("jvmUser") + "%");
     System.out.println(" JVM System: " + 100 * event.getFloat("jvmSystem") + "%");
     System.out.println();
   });
   es.onEvent("jdk.GarbageCollection", event -> {
     System.out.println("Garbage collection: " + event.getLong("gcId"));
     System.out.println(" Cause: " + event.getString("cause"));
     System.out.println(" Total pause: " + event.getDuration("sumOfPauses"));
     System.out.println(" Longest pause: " + event.getDuration("longestPause"));
     System.out.println();
   });
   es.start();
 }

This is the EventStream interface used to consume and filter an event stream:

public interface EventStream extends AutoCloseable {
  public static EventStream openRepository();
  public static EventStream openRepository(Path directory);
  public static EventStream openFile(Path file);

  void setStartTime(Instant startTime);
  void setEndTime(Instant endTime);
  void setOrdered(boolean ordered);
  void setReuse(boolean reuse);

  void onEvent(Consumer handler);
  void onEvent(String eventName, Consumer<RecordedEvent> handler);
  void onClose(Runnable handler);
  void onError(Runnable handler);
  void remove(Object handler);
  void start();
  void startAsync();
  void awaitTermination();
  void awaitTermination(Duration duration);
  void close();
}

The open* methods allow you to open a specific file or a specific file repository (for example from a different process). The set* methods allow you to filter on time and to select if you want to enforce that the events are delivered in time order. You can also allow the reuse of the event object that gets delivered, to get the memory pressure down a bit.

The onEvent* allows you to register a consumer for handling the events, either all of the events or by event name (type). The start method kicks off the processing in the current thread, startAsync is a convenience method for kicking off the process in a single separate thread.

Where to use streaming

There are several advantages to JFR event streaming. It is a great way to access JFR data for monitoring purposes. You get access to detailed information that was previously unavailable to you, even from different processes, should you want to.

Here are some examples:

  • Directly send monitoring data to your favourite monitoring service
    For example streaming select metrics over to Datadog. 😉 Not that you would need to – we already derive interesting performance metrics from the (full) flight recordings we capture. We even track complex metrics like top hottest methods, or top allocations sites, over time, using what we internally call high cardinality metrics.
  • Pre-aggregating data before sending it off
    For example, you could get the CPU load every second, and then every five minutes roll it up to an average, median, min, max and a standard deviation, not having to send every single entry.
  • Act on profiling data in-process
    You could, for example, make decisions for controlling the normal flight recordings given some statistics you track, like enabling certain events when it looks like it could be interesting.
  • Expose JFR data through other management APIs
    For example, adding an MBean exposing select JFR data over JMX.
    That said, there might be an API to directly connect to an MBeanServerConnection directly in the future[1]:

    MBeanServerConnection conn = connect(host, port);
    try (EventStream es = new RemoteRecordingStream(conn)) {
      es.onEvent("jdk.GarbageCollection", e -> ... );
      es.onEvent("jdk.ExceptionThrown". e -> ...);
      es.onEvent("jdk.JavaMonitorBlocked", e-> ...);
      es.start();
    }
    

It also allows you to skip the metadata part of a normal flight recording. The metadata in JFR contains the information about what was recorded, so that you can parse and view data that you may not even know about beforehand. In the case of monitoring a few well known data points, this is redundant information to keep sending over and over again.

Erik Gahlin has a neat example for producing health reports using JFR streaming, displaying pre-aggregation of the top frames for execution samples and allocation, as well as doing stats for some common data points, like GC metrics and CPU load.

Where Not to Use Streaming

This is from JEP-349[2], the JEP where JFR streaming was introduced:

To consume the data today, a user must start a recording, stop it, dump the contents to disk and then parse the recording file. This works well for application profiling, where typically at least a minute of data is being recorded at a time, but not for monitoring purposes.

Let’s explore why the JEP differentiates between monitoring and profiling. Some events in JFR are simple data points in time. Some are more complex, containing plenty of constants. For example stack traces. JFR takes great care to record these complex data structures in a binary format that doesn’t take a lot of processing time to produce, and which is still compact.

Some of the JFR events occur quite frequently – for example, a typical one minute recording of data can contain hundreds of thousands of events. The file size for such a recording is typically only a couple of MB large. There is a mix of techniques used to keep the size down, such as using constant pools to ensure that information like method names are not repeated, LEB128 encoding of integers etc.

For profiling you typically want quite a few of these events enabled. JFR was built to emit this data at a very low overhead, and the data is eminently useful to get detailed information about things like why your thread is halting. For example, the stack trace to a place where your code had to wait to enter a monitor, complete with the class of the monitor waited on, the exact duration of the wait, which thread was holding on to the monitor (making you not able to enter), the monitor address and more. Not only that, there may have been other events providing context about what was going on in that thread at the time of the monitor enter, shining further light on what was going on. Events that you may not even know about.

If using JFR streaming for profiling, you would spend a lot of effort either naively sending constant information over and over again in an inefficient way (say, serializing all of it to JSon), or trying to spend a lot of effort reproducing the JFR format (introducing your own constant pools etc).

For example, the RecordedEvent class contains a method to get the RecordedStackTrace, which is a List of RecordedFrame. For each event you would walk through, the in-memory object model would be created.

You can externalize some of that cost, i.e. how the process you are monitoring is affected, by using another process to read the data as described above. That would, for example, lessen the allocation pressure in the process you’re monitoring. That is great, for example if you have a very latency sensitive process. That said, you have now created another Java process and put the costs over there (including the cpu overhead of dealing with the memory pressure as well as the memory overhead of running another JVM), typically on the same host. If you can afford to dedicate the memory and pin the event stream reader process to its own processor (cpu affinity), this can be a good solution though. Note that the same can be done for normal flight recordings, i.e. you can stream the recorded data directly from the file repository from a separate process.

Benchmarks of Using JFR Streaming Wrongly

For laughs and giggles, here are some benchmarks using just standard JFR to get all the data in the profiling template, compared to getting the equivalent information and serializing it to a JSon-like format using JFR Streaming. In other words, abusing JFR Streaming as a streaming replacement for getting the full JFR dataset. This is of course not what you should be using JFR Streaming for, but exemplifies how wrong you can land if you use the technology in a way it was never intended. We’ll look at the latency of http requests, the CPU time spent and the allocation pressure. We’ll also look at the size of the payload of information extracted. The benchmark is admittedly being a bit extra mean as well, to explore edge cases.

Note that this is a simplified example – we’re not even bothering with extracting the full stacktrace information and re-encoding it for streaming, whilst the JFRs in comparison already contain the full stack traces for all events efficiently encoded in constant pools. We could trade (even more) CPU for trying to bring the streamed data back to a JFR style format with constant pools again before storing/sending it. For serialization we’re simply doing toString() on the event objects, which gives us the events in a JSon-like format with only the top five frames of the stack trace. Including the full stack traces would add quite a bit more overhead to the streaming example.

The benchmark is available in this GitHub repository. It is based on the standard PetClinic application with some modifications to make it relevant to measure overhead in general.

Note: We are using an early access version of OpenJDK 15 in the benchmarks, since we discovered a bug whilst building this benchmark. See JDK-8245120.

The first chart shows the impact on http request latencies:

Note: Y-axis is in log scale to magnify the difference.

Next chart shows the CPU consumption. We measure in ticks from /proc/<pid>/stat at the end of the benchmark. This way we have a unique number easy to compare with other runs. It represents the CPU consumed when threads were scheduled on the CPU.

 

The following chart is the total amount of heap allocated during the benchmark. Numbers were extracted from GC logs.

The final chart represents the size of extracted information, as described above:

Note that the JFR file contains the full stacktraces, and that the jfr-streaming one is limited to the top 5 frames.

So what is the conclusion of these benchmarks? Not much, except for: “don’t use technology for things it was never intended for”. 😉

Summary

  • JFR streaming is a great new way to expose JFR data for monitoring purposes.
  • JFR streaming is available from JDK 14 and above.
  • JFR streaming is complementary to the already existing JFR APIs, which remains the go-to way to use JFR for low-overhead detailed information (i.e. profiling / execution tracing).
  • Knowing where and how to use JFR streaming is key to avoiding sad pandas.

[1]:https://www.reddit.com/r/java/comments/e97vos/jfr_event_streaming_with_jdk_14_in_outprocess/faiapm8/

[2]:https://openjdk.java.net/jeps/349

Fantastic JVMs and Where to Find Them

Since you’re reading this blog, chances are that you’re writing software which will eventually run on a JVM. Most of you are using the Java language. Many of you are using a variety of other languages that target the JVM, such as Scala, Kotlin, Clojure, Groovy, (J)Ruby etc. Eventually you’ll need to decide on which JDK/JRE to deploy your software on in production. This is much easier said than done. There are quite a few different vendors out there, providing support and taking responsibility for the binaries they produce. They can have different support lengths for specific versions, and whereas you can sometimes find a vendor providing extended support for a version that has been officially end-of-lifed at Oracle, you may not find builds with the latest fixes in them publicly available. You’ll need to get those directly from the vendor.

After trying to figure out what’s what, I thought I’d simply write a blog post on the various JDKs available out there. This is especially important, since you might be consuming your JDK from a container provided by a third party, e.g. Docker Hub, and you may not know exactly what you’re getting[1].

Release Version Chicken Race

Typically most companies will require that you keep your dependencies up-to-date. For example, if you’ve written something with a dependency on Tomcat, you are pretty likely to keep your dependencies up-to-date. GitHub may even warn you if you’re running with a version that has known security implications. However, not everyone is keeping their JDKs/JVMs up-to-date. Which is funny, in a way, since everything you’ll be running could be affected.

Let’s take the Oracle JDK as an example. JDK 7 was GA in July 2011. Publicly available updates and fixes ceased in April 2015. Oracle’s Premier Support ended in July 2019, and even the Extended Support ends 2022.

Let’s say you’re running on JDK 7. If you got your JDKs from Oracle, without a support contract chances are that the latest version of JDK 7 you got was built in 2015. You are now five (5!) years behind on critical security patches.

In other words, if you’re still running your software on JDK 7, you may want to at least begin upgrading to 8. JDK 7 is dying and support is being dropped left and right. If you aren’t buying support and have someone provide you with (security) patches, you might want to accelerate the effort. Also, this particular upgrade (7->8) should be relatively painless – in most cases it will be a drop in replacement. Now, if you’re not running a JDK 7 with the latest patches (sanity check – was the JDK at least built this year?), you may not only be missing out on bug fixes, but you may also be missing out on security patches[2]!

The same arguments could be made for JDK 8 as well, on a slightly pushed out time-line. The good news is that there are still public (and free) updates coming from the OpenJDK 8 maintenance project. That said, there are plenty of advantages for upgrading to JDK 11+, better performance being one of them.

Now, when the new, faster, release schedule was announced, Oracle announced that every 3 years, there would be an LTS (Long Term Support) version of Java. The releases in between the LTS releases would only be supported until the next release came out. Most vendors have adopted the same support scheme, which means that, at the time of writing, you should not be running ANYTHING on JDK 9,10,12 and 13 (unless you’re using Azul distributions, see [3]). They are not supported. Running them will only mean that you are lacking bug- and security fixes. To take a somewhat arbitrary example – if you stopped upgrading JDK 8 after 8u74, you are literally lacking thousands of fixes.

At the time of writing this blog, the new CPU (Critical Patch Update) releases have just been published, and these are the releases you should be running in July 2020 (sooner rather than later):

  • JDK 8u262
  • JDK 11.0.8
  • JDK 13.0.4 [3]
  • JDK 14.0.2

If you’re running anything else in production, without a support contract, it could be argued you’re not doing things quite right.

What’s what?

OpenJDK, being open sourced, has builds provided by plenty of vendors. Here is a non-exhaustive list of some vendors shipping supported versions of OpenJDK (in alphabetical order, distribution(s) in parenthesis):

These providers usually ship distributions with pretty much the same bits from the OpenJDK repository, sometimes differing by what features are enabled, for example like a GC (Shenandoah / Red Hat), or by adding proprietary features like a new compiler (Falcon / Azul (Zing)). Some vendors have a free distribution (e.g. Oracle OpenJDK, Azul Zulu) and one that requires a commercial license (Oracle JDK, Azul Zing). Which vendor and distribution you should select depends on your demands – e.g. which vendor can provide reliable support to you (Oracle is one of the biggest contributors to OpenJDK), or which one provides the feature you need at a price point you can afford (e.g. JDK Flight Recorder on JDK 8 without the need for a commercial license, or support for a specific GC or compiler).

There are also upstream builds, not supported by anyone, built on Red Hat infrastructure and hosted by AdoptOpenJDK. For example, if you get a JDK 8 from Docker Hub (openjdk/jdk8u252, openjdk/jdk8), that is what you would get.

Where to get JFR – Public Service Announcement

As you probably know, JDK Flight Recorder, a technology close to my heart, has been backported to JDK 8. Since we’re talking about where to get your JVMs and versions, I thought I’d include a small table for which provider will be including JFR in what version of their JDK 8 builds.

Vendor First JDK 8 Version with JFR Release Date Docker Image
Azul (Zulu) u212* (u262+ recommended) 2019-04-16 azul/zulu-openjdk/8
AdoptOpenJDK u262 2020-07-16 adoptopenjdk/8u262
Red Hat u262 2020-07-15 In Fedora and RHEL
Amazon (Corretto) u262 2020-07-14 amazoncorretto:8u262
Bell-Soft (Liberica) u262 (separate binary) 2020-07-14 N/A
Upstream builds [4] u272 2020-10-20 openjdk/jdk8u272, openjdk/jdk8

Summary

  • Use the latest version of an LTS which is still supported, or the latest version
  • Use a supported build in production (even if you haven’t bought support)

Thanks to Mario Torre, JP Bempel and Gil Tene for feedback!

[1]: Mystery meat OpenJDK builds strike again: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2019-May/009330.html

[2]: To check the vulnerabilities you may be exposed to, see e.g. https://www.cvedetails.com/version-list/93/19116/1/Oracle-JDK.html?sha=b856721542b66953c859bd95be067255dd4c6098&order=1&trc=188

[3]: Upstream JDK 13u is being supported, and Azul has announced 13 to be “Medium Term“ supported – you can keep getting updates for JDK 13 for Azul distributions.

[4]: These are built by Red Hat and hosted by AdoptOpenJDK, and are different from Red Hat’s and AdoptOpenJDK’s supported builds.

The “Best of the JDK” Tournament

Over the last few weeks, there has been a knock-out tournament raging on Twitter, where various Java technologies have battled out which JDK technology is the best. It’s all part of the activities taking place around the celebration of Java turning 25 years. And boy, have those years been interesting.

Like many languages in use today, Java started out with a simple interpreter. That is, by the way, how Java got a reputation for being slow. Today, Java peak performance can surpass that of statically compiled languages, owing to optimizations only possible when runtime information is available. But I digress

As many of you know, I started out co-founding a company named Appeal – the company that created the JRockit JVM. We did quite a few cool things during that time; some of them relevant to the knock-out competition. We built the world’s first JVM management console, mostly since the application to become a Java licensee (so that JRockit could become a Sun certified JVM) required us to state a value-add. Our original application stated “better performance”, and was summarily turned down. 😉 With the work on the management console we eventually consolidated an API to monitor and manage the JVM – JMAPI (the JRockit Management API), which later inspired – and was superseded by – JSR-174 (java.lang.management)[1].

We also built a tool we called JRA (JRockit Runtime Analyzer). It really started out as a tool for finding out how the JVM was performing at customer installations – we needed information to better understand how to improve the JVM for real world usage. Customers, quite understandably, refused to let us borrow their applications to run them in our labs. To make it easy for them to understand and verify the data they were sharing, it was all emitted as text (XML). It didn’t take long for customers to see us use the tool and the (accidental) value it brought for optimizing their applications – was the tool perhaps for sale? As a startup, we of course said yes, and made it into a product. When we later introduced the JRockit DetGC (deterministic GC), there was a need to be able to prove that the GC was keeping the latency contract, and show where in the customer code any thread halts were caused (e.g. due to bad synchronization). So the JRockit Runtime Analyzer was extended with LAT (the Latency Analysis Tool), which now introduced a binary artifact for the latency data for better data density and less serialization cost. In the end the JRA and LAT was unified into a single model – JFR (JRockit Flight Recorder, later Java Flight Recorder, and finally re-dubbed into JDK Flight Recorder when it was open sourced). We also created an impossibly cool on-line memory analysis tool (which was sadly never ported to hotspot), together with a slew of other little tools and utilities.

The good old JMC memleak tool

Some of these tools converged into Java Mission Control, which became the hub for the cool tools we were developing.

JMC Logo

I was happily surprised to see JDK Mission Control included in the “Best of the JDK” feature face off. I was doing little dad-dances (to the embarrassment of my kids) in total astonishment when JDK Mission Control got up against the runtime and language features and ultimately won the whole thing.

Competition Results

Tech Poetry Throw-Down

One of the best parts of this whole competition was when Erik Costlow wrote some poetry in support of JDK Mission Control. This sparked an epic tech-poetry throw-down with little poems in favour of various Java technologies.

Here are a few of my favourites entries for JMC & JFR (in no particular order):

Of JDK Mission Control

whose benefits I will extol:

It watches performance

while still in conformance

So therefore it should win this poll.

  – @costlow

(The one which started the it all)

2 am in the morning, my mobile chimed,

The war room conf call had to be primed.

JVM’s are down, the helpdesk said,

Touch troubleshooting road ahead.

CPU? GC? Bad Code?, the questions abound,

The root cause was far from being found.

Tumultuous voiced from Dev to Ops, each one declaring the were clean

No path to the solution was to be seen.

With a prayer, I fired up the Java Flight Recorder,

Hoping this would restore some war room order.

Lo! And behold, the histogram revealed

‘Twas a code deadlock, the system could yet be healed!

Helpful NullPointer messages, I hear you say,

Who will alert you whilst you are away?

  – @perfclarity

To see or not to see (perf data)

That is the question (mission control answers).

Whether ‘tis nobler in the code

To suffer the zings and harrows of outrageous finger pointing

Or to stream events and by analyzing, end it

  – @costlow

I have never

had to deal

with NullPointer

Exceptions

and which

many people want

to have

better messages

Forgive me

but my vote goes to JMC

it is so sweet

and so cold

  – @stuartmarks

To think that I could ever see

A tool so lovely: JMC

A tool that streams events all day

Yet still performs without delay.

  – @costlow

If you need to control a mission

OpenJDK had an omission

And then JMC

Was suddenly free

Without even rights of rescission

  – @stuartmarks

So much value inside JMC

Yet usage was low, tis it wasn’t free

But low and behold

Oracle open sourced it in whole

And now productivity is as easy can be

  – @Sharat_Chander

As I stream through the events of my workload perf pain

I take a look post 8 life and realize this tool should reign

‘cause that’s just perfect for a coder like me

You know we love fancy things like JDK MC

Been spendin’ most our lives livin’ in a coder’s paradise

@costlow

Here are a few of my favourites for the other technologies:

Null pointer exception

Is a old familiar friend

And she wants to be

more helpful again

With deep information

I can only begin to extol

Love for NPE

For she should win

this Java poll

  – @manicode

There was a NullPointerException

Whose message needs amplification

To the VM some hacks

Add the relevant facts

And no longer is it an obsession

  – @stuartmarks

As I try to decipher my NPE in grails

The Greater Sage-Grouse wanders the sage brush

The grouse and I are one

For I can’t decipher less helpful NPE’s in grails

Any more than the sage-grouse knows why it wanders the sagebrush

  – @manicode

I’m on a boat motherf$%^r take a look at me

Straight floatin’ on a boat debugging NPE

Busting five knots, wind whipping out my coat

You can’t stop me motherf$%^r cause I’m debugging on a boat

  – @manicode

The usability of NullPointerExceptions

have historically been an issue

by adding static code to dynamic exceptions

our problems we can diffuse

Let go of your stack trace debugging hate

And vote for JEP Three Fifty Eight!

  – @manicode

Many thanks to @costlow, @manicode, @stuartmarks, @perfclarity and @Sharat_Chander for all the laughs! 🙂

Thanks!

Yes, I know this is a silly little Twitter competition. But, if nothing else, this silly little competition provides an excellent opportunity for me to give some overdue thanks:

  • Plenty of thanks and love to all of the users of JMC out there, using JMC to solve tricky problems in production systems on a daily basis.
  • Many thanks to everyone who voted for JMC. I didn’t think a tool would stand a chance against language and runtime features.
  • Huge thanks to all the developers on the JDK Mission Control team, and to all the developers on the JDK serviceability team. You’re a really awesome bunch, and it’s a privilege for me to be working with you.
  • Major kudos to Oracle for open sourcing JDK Mission Control and JDK Flight Recorder.
  • Many thanks to the main sponsors of the development of JDK Mission Control:

JRockit and Duke hanging

[1]: Sadly, not all of the features in JMAPI got rolled into the standardized API. JMAPI could, for example, change the CPU affinity of the JVM process on the fly, dynamically change the heap size target, and independently (and dynamically) switch the GC to use a nursery or not as well as switch between concurrent and parallel mark and/or sweep phases. Of course differences in GC capabilities etc required the standardized API to be limited to what made sense to most runtimes. That said, I’m still kinda bummed that it became a JMX API (java.lang.management depending on the javax.management specification), instead of a pure local Java API, which could also have been exposed through JMX. See, for example, the JFR APIs, where there is a local API and also a JMX API.

Oracle Releases JDK Mission Control 7 GA

Oracle just released their GA build of JDK Mission Control 7.0.0. I, of course, had to download it to give it a spin.

Here are my main takeaways:

  1. Compared to the early access builds, it no longer comes with an embedded JDK. This is actually nice, since you can run it on whichever JDK you’d like. That said, it does require you to have a JDK already installed. Since local auto-discovery of locally running JVMs will not work unless running on a JDK (it does not work on a JRE), it also makes it a little bit easier to get things wrong.

    You may want to configure the jmc.ini file to point to a JDK manually. Simply add a -vm entry just before the -vmargs, like so:

    ...
    --launcher.appendVmargs
    -vm
    C:\Java\JDKs\jdk-11.0.5\bin
    -vmargs
    -XX:+UnlockDiagnosticVMOptions
    ...
  2. Oracle has put up a properly configured update site. This means that in Oracle’s builds of Mission Control, there are additional plug-ins that can be installed by going to Help | Install New Software…
    updatesite
  3. Everything, except for the Oracle specific optional plug-ins from the update site, is released under the very permissive UPL license. The Oracle ones are under a separate group named Mission Control (Oracle) on the update site, so they are easy to spot.
  4. Working my way back from the updatesite.properties file in the application, I found an Eclipse update site available here:
    https://download.oracle.com/technology/products/missioncontrol/updatesites/openjdk/7.0.0/ide/
    (Edit: After posting this blog, I noticed that reading the release notes would have been easier. ;))

TL;DR

Oracle releases a solid first (though a bit delayed) release of JMC 7. A notable difference to Oracle’s early access builds, is that there is no longer an embedded JDK. A notable difference to other JMC releases is that there are published update sites – both for the stand alone application, and for installing it all into the Eclipse IDE.

So, in short, yay!