JDK Mission Control 9.1.0 Released!

2025-02-12 - By Marcus

The source release of JMC 9.1.0 was tagged 2025-01-31. As per usual it may take some time until vendors have binary builds of JDK Mission Control available.

Here are the release notes:

Mission Control 9.1 – New and Noteworthy

General

JMC 9.1 – New Release!
This is the latest (January 2025) release of JDK Mission Control. JMC 9.1 requires JDK 21+ to run and introduces several new features, enhancements, and bug fixes. This version continues to support connecting to, and parsing JFR recordings from, OpenJDK 8u272+ and Oracle JDK 7u40+, and can open and visualize flight recordings from JDK 7 and 8. JDK Mission Control is available for Windows (x86_64), Mac OS X (ARM and x86_64), and Linux (ARM and x86_64).

Eclipse 4.34
The Mission Control client is now built to run optimally on Eclipse 2024-12 and later. To install JDK Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

Support for Jolokia JMX Connection and Discovery
JMC now supports connecting to JVMs using Jolokia, and also supports Jolokia’s auto discovery mechanism.

Minor bugfixes and improvements
There are 64 fixes and improvements in this release. Check out the JMC 9.1 Result Dashboard (https://bugs.openjdk.org/secure/Dashboard.jspa?selectPageId=23411) for more information.

Core

API to easily write annotated Java JFR events
The JFR Writer API has been extended with the ability to use the custom JFR event types (i.e. classes extending jdk.jfr.Event) and register new writer type for them and also directly accept the instances of those types to write them in the recording.

Allow primitive types in converters
Previously a converter could not be used to convert from a primitive type like long. This capability can for example be useful to convert a timestamp (type long) into a human readable string.

Rule for detecting Java process with PID 1
PID 1 is treated specially in Linux, and is assumed to be used by the init process. The init process has some additional responsibilities, such as assuming the responsibility for orphaned processes. The init process is assumed to never quit, and e.g. any signal handler registered for SIGSEV will not be run. This can be problematic, even when running in containerized environments. The rule will detect if this is the case, and propose a path to fixing it (e.g. using tini).

The halt rule result has been improved
The halt rule will now show a table of the top 5 thread halting VM operations.

Better descriptions for the code cache rule
The descriptions for the code cache rule have been improved, highlighting what the effects of a full code cache can be and what actions to take to increase the size of the code cache.

Add support for the new allocation profiler in rules
The following rules have been updated to be able to use the new ObjectAllocationSample events: AllocationByClassRule, AllocationByThreadRule, AutoBoxingRule.

Java Flight Recorder (JFR)

JMC displaying long value in scientific notation
The TLSHandshakeEvent.java records “Certificate Id” as a long value, and JMC was showing it in scientific notation. This was also true for process identifiers. This has now been fixed.

Showing RSS
JMC will now show the the resident set size (RSS), both on the memory page as well as the java application page. The RSS can for example be helpful when trying to determine if there is a native memory leak or heavy native memory fragmentation occurring. The RSS graphs can be toggled on and off using the check box legends to the right of the graphs.

Showing thread counts
JMC will now show thread counts on the java application page. This can for example be useful when trying to determine if there is a thread leak. The thread count graphs can be toggled on and off using the check box legends to the right of the graphs.

Thread id on hover
In the threads page, hovering over a thread name will show the threadid in the tooltip.

Bug Fixes

Area: Platform
Issue: 8306
Synopsis: Missing plug-ins while installing JMC in an Eclipse IDE

Mission Control would fail to install because some third party libraries were not included in the update site. This has now been fixed.

Area: Core
Issue: 8295
Synopsis: Shutdown event type id was not properly translated for Oracle JDK 8

The shutdown event type id for legacy Oracle JDK 8, leading to the shutdown time and shutdown reason not being properly displayed on the JVM Internals page.

Area: Core
Issue: 8287
Synopsis: Fix the JMX protocol extenders

The JMC protocol extension mechanism was broken in JMC 9.0.0 when some code was migrated from application to core. This has now been fixed.

Area: Core
Issue: 8303
Synopsis: NPE when running jfr rules reports

The rules report could throw NPEs when an accessor for an attribute could not be found. That has been fixed, and the faulty query responsible for the reported problem has been fixed as well.

Area: JFR
Issue: 8248
Synopsis: Low contrast for the stacktrace view when running in windows high contrast mode

With high contrast mode enabled in Windows 7 and above the contrast on the Stacktrace View was quite low, with the values being barely visible. This has now been fixed.

Known Issues

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7003
Synopsis: The graph view, heatmap view and dependency view does not work on Windows

This is due to a problem with the Windows based browser component in SWT. We’re hoping for a fix in the component for a future version of the Eclipse platform.

AI Assistant for OpenJDK Contributors

2024-08-27 - By Marcus

Subtitle: ”Soon You’ll Know What I Did This Summer”

This summer my family and I decided to stay in Küssnacht during my summer vacation. The summers here are lovely, and for a family of six going for a vacation abroad it typically means a lot of time is spent preparing, packing and unpacking, for both directions of the journey. A staycation seemed like a good choice. Most days were spent in the water with the family, and on one occasion, we were joined by a team member and his family. It can get rather hot here, and the lake is the perfect place to cool down.

Me on a SUP overlooking part of my family on another SUP on Lake Lucerne

That said, I did end up in front of my computer a bit in the evenings after everyone else had fallen asleep. There were a few things I really wanted to get done this vacation. One was upgrading my server. Another one was to put some left over Raspberry Pi 4s and a few new Raspberry Pi 5s to good use in my 19” rack, creating my very own a little Kubernetes cluster.

Recently I was also playing with Custom GPTs, which is OpenAI’s way of letting you build little RAG (Retrieval Augmented Generation) systems without writing code. I first played a bit with Custom GPTs on a train ride back from the Datadog office in Paris a little while ago, and it was a quite lot of fun, but for some of the things I wanted it to do, it performed rather badly, and I couldn’t find any fitting APIs available to help my Custom GPT do better.

Since I was setting up a Kubernetes cluster anyway, I thought I might as well hack something together to make my OpenJDK Project Assistant a little bit better.

Setting Up the Cluster

Ah, the pain and the anguish. I ended up using my Apache server as a proxy (since I still want https://hirt.se to serve my old homepage and blog) for an Nginx that is acting as reverse proxy and load balancer to my Kubernetes cluster.

If anyone wants to know how I set up the cluster, that is an entirely different blog that probably already exists in a thousand versions, written by people with more patience to write about configuration than me, so I will not dwell on that. Suffice to say that I consulted many blogs and Claude 3.5 Sonnet to get it all up and running. Here are the steps in short:

Format and setup the Raspberries with Raspberry Pi OS (64-bit) (I used the Raspberry Pi Imager).
I added my Pis to the hosts file of my server and to my dhcpd config by MAC address, so that I can refer to the Pis by name instead of IP (I’m using dnsmasq).
I installed kubernetes on them, which involved properly enabling cgroups in /boot/firmware/cmdline.txt, setting up containerd properly, getting the debian package for k8s, installing kubelet, kubeadm, kubectl, turning off swap, and finally using kubeadm init to create the cluster and then applying flannel. Then using the join command you get from performing the init to have the other Pis join the cluster.

There was a bunch of trial and error, and there is probably a much easier way. Anyway, I now have a cluster. I can easily deploy. I can easily scale. What more could you ever wish for?

The API exposed to the CustomGPT is available to the world at https://api.hirt.se and described here, should anyone be interested. The implementation is available on GitHub – feel free to contribute if the custom GPT isn’t doing what you think it should be doing or if you simply want to make it more capable.

What Can the OpenJDK Project Assistant Do?

It can do lots of things. It can get information about open PRs and repos without overwhelming GPT with information that will invariably make it fail (this is what typically happened before the api.hirt.se API). For example, here’s getting the oldest open PR for the JDK project:

It can summarize the information in the PRs:

You can also ask for information about the related bug:

It can answer questions around people involved in OpenJDK:

For people like me, who have a really hard time remembering names, it can help answer questions about people involved in OpenJDK:

And about projects and groups:

Please let me know if you find it useful!

Summary

I recently had a stay-at-home vacation in Küssnacht.
Aside from spending a lot of the time in Lake Lucerne with my family, I did have some fun building a Kubernetes cluster out of my Raspberry Pis.
I put the cluster to good use to support my OpenJDK Project Assistant Custom GPT.
If you’re one of the few OpenJDK committers that also have access to Custom GPTs, or you simply like the idea, feel free to put a star on my API repo, or even better – contribute things you want it to be able to do, or do better!
Silly bonus: I also noticed that the silly old slogan image generator I was using on my homepage isn’t available under https (after my various upgrades, I now have permanent redirects to always use https). I couldn’t find any replacement that worked as a drop-in replacement, so I made my own variant that is now also running in my little cluster.

JDK Mission Control 9.0.0 Released!

2024-03-22 - By Marcus

The 9.0.0 GA release of JDK Mission Control was just tagged in the JMC repo at GitHub! Since this is the source release, it may still take a bit of time until the downstream vendors release binary builds of JDK Mission Control 9.0.0. I will try to remember to tweet or say something on the JMC Facebook page once the binaries start showing up.

Mission Control 9.0 – New and Noteworthy

General

JMC 9 – New Release!
This is the latest (2024) major release of JDK Mission Control. JMC 9 requires JDK 17+ to run and introduces several new features, enhancements, and bug fixes. This version continues to support connecting to, and parsing JFR recordings from, OpenJDK 8u272+ and Oracle JDK 7u40+, and can open and visualize flight recordings from JDK 7 and 8. JDK Mission Control is available for Windows (x86_64), Mac OS X (ARM and x86_64), and Linux (ARM and x86_64).

Eclipse 4.30 support
The Mission Control client is now built to run optimally on Eclipse 2023-12 and later. To install JDK Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

Support for Linux/aarch64
JMC 9 is now built for Linux aarch64.

Support for dark mode
JMC 9 now supports dark mode. Go to Preferences, General | Appearance, and select the Dark theme to enable.

Minor bugfixes and improvements
There are 118 fixes and improvements in this release. Check out the JMC 9.0 Result Dashboard for more information.

Add user configuration for local JVM refresh interval
Previously the JVM Browser checked every 5000 ms for new JVMs. This can now be configured.

Core

Better JFR parser performance
Multiple efforts have been made to reduce allocations in the JMC parser, including: reduced allocation of Doubles, reduced allocation rate in ParserStats. Also, when duration events aren’t ordered by their end time (e.g. events which stack so that the last event finishes first, or file reads with overlaps) `DisjointBuilder.add` can be slow because of the linear search for the lane, and then a linear time reordering. This has been improved with a binary search.

Support checkpoint event sizes beyond u4 limit
The JMC JFR parser now support checkpoint event sizes beyond the u4 limit.

Move non-Eclipse dependent classes from org.openjdk.jmc.ui.common to org.openjdk.jmc.common
There were a number of classes previously in jmc.ui.common that would be a great asset to the core distribution (and the third-party applications that consume jmc-core), and these classes now live in jmc.common. Please see JMC-7308 for further information.

Move rjmx bundle from application to core
The rjmx classes and related services (FlightRecorderService) are now exposed for third-party application usage. Please see JMC-7069 for further information.

Move org.openjdk.jmc.flightrecorder.configuration bundle from application to core
The org.openjdk.jmc.flightrecorder.configuration bundle contains many classes useful for working with jfr, and are now available in core. Please see JMC-7307 for further information.

Java Flight Recorder (JFR)

The Event Browser now supports searching and showing event type ids
Searching in the search bar now also searches event type IDs, and there is also a (by default hidden) column that makes it easy to show the event type IDs for the shown events.

Add support for enabling jfr on native images
Previously JMC was unable to start flightrecorder on a graalvm native image, even if there is built-in jfr support. This has now been fixed.

Java based flamegraph visualization
The previous flamegraph visualization takes place in an embedded browser component (provided by the Eclipse platform), unfortunately this approach has some drawbacks, the first being a bit slow. This view is now using a Java (Swing) based flamegraph library. Also, the flame graph model creation performance have been improved.

Visualization and Rule for FileChannel.force()
The File I/O page has been updated to show force related information. There are two new columns added – Force Count and Update Metadata. Both are hidden by default and can be enabled by right clicking the table. The chart will also include a File Force row. There is a preference setting for the associated file force rule, where the peak duration warning limit can be set. See JMC PR#533 for more information.

Rule that checks on G1 pause time target compliance
New rule that looks at the pause time target and compares it to the actual pauses.

Rule that looks at finalization statistics
JDK 18 comes with a FinalizationStatistics event that helps users find where in their application finalizers are run. This is important as finalization has been deprecated for removal in a future release. For more information about finalization and its flaws, see https://openjdk.java.net/jeps/421. Even if an application doesn’t implement any finalize() methods, it may rely on third-party libraries that does. Static analysis of third-party libraries using “jdeprscan –for-removal” can be used to list those classes, but it will not tell if they are being used. For example, an application may be missing a call to a close() method, so the resource is cleaned up by the finalizer, which is sub-optimal.

Rule that detects GC Inverted Parallelism
Rule inspired by the “Inverted Parallelism” analysis in Garbagecat. See JMC-8144 for more information.

Support for the new JPLIS agent events
There is now a new page and rule for loaded JPLIS agents. See JMC-8054 for more information.

Twitter plug-in removed
Due to changes in APIs and cost of maintenance, the Twitter plug-in has been removed.

Bug Fixes

Area: Agent
Issue: 8045
Synopsis: retransformClasses() doesn’t re-transform all needed classes

The retransformClasses() methods in Agent and AgentController use Class.forName() to try to get the class objects of classes needed to re-transform. This obviously doesn’t work for classes loaded by classloaders different from the one which loads the agent. Those classes would be instrumented if they were loaded after their event probes were defined the AgentController. But when loaded earlier they would not be instrumented. This has been fixed.

Area: Agent
Issue: 8048
Synopsis: Agent throws exceptions on missing or empty descriptions

When the description of an event or value is empty or missing, the agent fails with exceptions. This has now been fixed.

Area: Console
Issue: 8154
Synopsis: Some JMX attributes are missing unit specifications in the Console

The missing unit specifications have now been added.

Area: Core
Issue: 8063
Synopsis: IMCFrame Type cache not synchronized

The type cache used in the IMCFrame Type inner class wasn’t synchronized and could cause a concurrent modification exception during e.g. JFR parsing. This has been fixed.

Area: Core
Issue: 8156
Synopsis: JfrRulesReport.printReport does not respect verbosity for text and json

The verbosity flag for text and json reports didn’t work. This has been fixed.

Area: Core
Issue: 8041
Synopsis: JfrRulesReport json reports produce incomplete results

While generating JFR Rules Reports in json format, the results were incomplete. The components “message” and “detailedMessage” were not populated. This has been fixed.

Area: JFR
Issue: 7885
Synopsis: Graphical rendering of dependency view fails due to heap memory drain

Also JMC-7496. The dependency view drains the heap memory and causes out-of-memory exceptions and performance delays. This has been improved.

Known Issues

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7003
Synopsis: The graph view, heatmap view and dependency view does not work on Windows

This is due to a problem with the Windows based browser component in SWT. We’re hoping for a fix in the component for a future version of the Eclipse platform.

What is this thing called Profiling?

2023-08-25 - By Marcus

If you’re reading this blog (I’m originally posting this on hirt.se), you probably already know who I am and my background. As some of you may know, one of my current responsibilities at Datadog is the Continuous Profiler product. After some discussions with profiling team members, I found it interesting that there are many subtly different ideas about what profiling is – often influenced by what a particular ecosystem is calling profiling.

So, here are my (unsolicited) thoughts around what software profiling is. 😉

What’s in a Word

Profiling literally means trying to understand behaviour. When asked to define Profiling, Google will say this:

“the recording and analysis of a person’s psychological and behavioural characteristics, so as to assess or predict their capabilities in a certain sphere or to assist in identifying categories of people.”

This is analogous to what we typically mean when we talk about the profiling of software, which I would simply state as:

“The recording and analysis of a program’s runtime behaviour.”

We’re simply trying to understand how it behaves (recording data), and why it is behaving that way (analysis of the data), so that we can improve some aspect of the program. There are other techniques that we can use to understand the semantic behaviour of a program, such as using debuggers. In profiling though, we’re trying to understand the runtime behaviour – how the program (your code) is behaving in terms of utilization of constrained or costly resources, such as CPU, memory, locking primitives and other potentially thread latency inducing operations and so on.

As to why we want to understand the runtime behaviour, there are a lot of different reasons these days:

To support performance engineering, for example:
- Reducing the cost of running a program.
- Making the software run faster (throughput).
- Making the software run smoother (e.g. less latency outliers, less STW-interruptions, less variance).
- Understanding performance deltas between versions of a program.
- Optimizing resource utilization.
To aid in diagnostics, for example:
- Help discovering and understanding unknown-unknowns.
- To help explain deviations from normal behaviour, e.g. why there was suddenly a 40 second delay in the execution of a particular program at a particular point in time.
- Help provide a link to the source code in the context of something else the user cares about, making whatever occurred more actionable.

I would argue that there are a few additional constraints for a profiler to be truly usable these days:

The Heisenberg Observer Effect notes that we can’t observe a system without affecting it. That said, a profiler that materially changes the runtime behaviour of the software it is profiling is not very useful – it will make us bark up the wrong tree.
Because of this, profilers will be making trade-offs. If the profiler, in the quest to not affect the runtime behaviour of the software it is profiling, misrepresents the runtime behaviour too much, it is also not very useful.
Also, since it is notoriously difficult to build a test which will perfectly mirror the system behaviour on Black Friday, 8:00 p.m. in production, these days you typically want a profiler that has a low enough overhead (it will be too costly otherwise), and that is stable enough, that you can use it continuously in production. A continuous profiler is a very powerful tool for finding the unknown-unknowns, especially when you’re in a tough spot.
With the uptake of continuous integration and continuous delivery practices (CI/CD), a new version of a program can be published every few hours, or even more often than that. You will want, at least, to have production data around for every version you publish, and probably from multiple different time periods during the process lifecycle. (Of course, with continuous profiling, this point is moot – you have data for all time periods, should something happen.)

Sampling Profilers

Today, most profilers will be sampling profilers. A sampling profiler is a type of profiler that collects data about a program’s execution by periodically sampling the program’s state at specific intervals. In contrast to other profilers, which typically capture every function call or at specific runtime events, sampling profilers gather information by intermittently observing the program’s execution state.

This means that the correlation with a certain runtime characteristic will depend on when the sample was taken. To do CPU profiling, simply wait for a thread to use up a certain amount of CPU time, then signal the thread and take a sample. To do allocation profiling, wait until a certain amount of memory has been allocated, then take a sample (in the allocation path of the runtime). To do lock profiling, wait until a monitor has been waited on for a certain amount of time, then take the sample (in the appropriate monitor handling path of the runtime). The reason for why one must sample, is that tracing every method/function invocation will cause too much overhead, quite possibly affecting the runtime behaviour of the application.

A sampling profiler will try to sample uniformly over some quantity, for example every 9 ms of CPU-time consumed. This gives some rather nice statistical properties. It is easy to aggregate the samples and relate them to that quantity – “this method is on average using 456.32 ms of CPU time / s”, “that method is responsible for an allocation rate of 845 MiB / s (which in turn is why your garbage collector is running hot)”.

Note that these sampling profilers do not need to pre-aggregate data to be proper sampling profilers. With the advent and adoption of the pprof format, this is sometimes assumed, but there are plenty of sampling profilers that also capture the time the sample was taken. This makes the samples much more useful for diagnostics. One recent example was a Go service where it wasn’t discovered, until the time stamps were preserved, that the service has bursts of activity for a couple of 10s of milliseconds every 10 seconds, which stood out very well in a heat map, once time stamp information was included per sample. Collecting timestamps per sample (and adding context) helps immensely with diagnostics, but more on this later.

We don’t necessarily need stack traces for this to be profiling. We capture whatever is needed to understand how we came to present the observed behaviour. That said, having no execution context at all, for example a simple performance metric, will usually not be enough to satisfyingly help with the analysis part. It’s usually understood that the stack trace will be one of the primary sets of data included in the sample, since it is indeed very useful in many cases and languages.

Also note that the data production rate can still be hard to understand even with a sampling profiler. For CPU it’s relatively easy – the upper limit will be #cpus * average sample size / sample interval. For allocation sampling, it was hard enough that we (Datadog) introduced a new rate limited allocation profiler in OpenJDK, conceptually using a discrete PID controller to control how to subsample (we can’t use reservoir sampling, since we don’t want to do the work up front, and then decide which samples to keep), and also record the amount of memory allocated since the last sample in each sample to be able to normalize.

Execution Tracers and Event Recorders

Another kind of profilers are the so-called execution tracers. An execution tracer instrument certain operations in the runtime and typically provide events around them, often containing useful diagnostics information. For example, the monitor event in JFR will contain information about the monitor class, the thread holding on to the monitor blocking our hero thread, the address of the monitor (so that we can see if there are multiple monitor instances at play), and more. Note that sampling profilers (especially runtime specific profilers) can capture such information as well, so the difference is mostly in how the sample is taken.

Since emitting data for every invocation of a pathway in the runtime can be prohibitively expensive, tools like JFR will provide configuration options to subsample the data in different ways. This can be useful, for example, for outlier profiling. An example of outlier profiling are all the thread latency events in JFR, for example the monitor enter event, where you can specify to only pick events with a duration longer than a specified interval.

Serious drawbacks of execution tracers, even the ones that only capture specific events, are that:

Unless there is a subsampling strategy, and even then, the amount of data emitted can be very hard to reason about.
Depending on the subsampling strategy, some aggregations will be harder to reason about. For example, if you’re looking at latency outliers, you can paint a picture of where the latency outliers are, but you will not be able to tell what the average latency is.

Performance Engineering

Performance engineering is the black art of optimizing the software to do better on some runtime characteristics. To do that, profiling data is analyzed, for example learning where CPU is spent, so that the program can be optimized to use less resources (e.g. CPU), which in turn makes it less costly to run (need less hardware). Supporting performance engineering is what people most commonly will associate with profiling, and for that use case, it’s most commonly CPU profiling that springs to mind. Many programming languages/runtimes will have the concept of a stack, and ultimately, the underlying hardware will have the concept of hardware threads, and stacks to execute. Therefore, to understand how we came to execute a particular method or function, profilers will often capture a stack trace. Depending on the programming language and environment (such as frameworks used) this can be more or less useful, as practitioners of reactive programming and async frameworks will be very well familiar with. (Project Loom is in a way a response to this problem. Make Stack Traces Great Again! 😉 )

Both sampling profilers and execution tracers can be put to good use to understand the runtime profile of a program. For example, if you demand a little bit more of your distributed tracer and the tracer integrations, and keep tabs of when threads are doing work in the context of a certain trace / span / operationName, you can start aggregating profiling information by endpoint, for example showing the amount of CPU-time spent by a specific endpoint.

Using Profiling for Diagnostics

Of course, when you have samples that contain context and time information, you can also go look at that information when something goes spectacularly wrong. For example, looking at a timeline view of the threads involved in processing a part of a distributed operation that was painfully slow, can reveal a lot. When something goes wrong, and it has not been a priori instrumented by the tracer, logging or some other instrumentation, profiling data is often the last resort for explaining what went wrong. In other words, it can help understand the unknown unknowns.

Here are some screenshots that hopefully will give you some idea of the capability:

For some examples using this feature (in Go), see Felix Geisendörfer’s YouTube video.

Note that these screenshots were from profiling timelines for specific spans in a distributed trace. It’s also possible to look at a timeline for all the threads in the runtime.

Some time ago we had a 21 second span that remained unexplained until the profiling data showed that it was a safe pointing VM operation related to dumping the heap – someone had ssh:d into the machine and used jcmd to request a full heap dump. Had it not been for recording outliers for safe pointing VM operations, this could have been hard to explain. Profilers purely written in Java and using exceptions to force stack walks, or using the Java stack walking APIs, would never know better. For them it would have been like the world would have been stopped, and the only visible effect would have been that the clock suddenly skipped ahead 21 seconds.

TL;DR

Profiling is the recording and analysis of the runtime behaviour of a program.
Profiling can not only be used for performance engineering, but it can also be a very powerful diagnostic tool.
Profiling samples often contain stack traces, but in some paradigms, stack traces will not be the most helpful thing to explain why something ended up being called.

Many thanks to my colleagues at Datadog for all the awesome work they do, and for the feedback on this post.

JDK Mission Control 8.3.0 Released!

2022-11-22 - By Marcus

The latest release of JDK Mission Control was recently released! Since I am a bit late with this blog, there are already some binary releases available, for example:

Eclipse Mission Control:
https://adoptium.net/jmc/

Zulu Mission Control:
https://www.azul.com/products/components/azul-mission-control/

Mission Control 8.3 – New and Noteworthy

General

JMC 8.3 – New Release!
This is a new minor release of JDK Mission Control. The JMC application requires JDK 11+ to run, but can still be used to connect to, and parse JFR recordings from, OpenJDK 8u272+ and Oracle JDK 7u40+. It can also still open and visualize flight recordings from JDK 7 and 8. JDK Mission Control is built for Windows (x86_64), Mac OS X (ARM and x86_64), as well as Linux (x86_64).

jmc[1]

Eclipse 4.24 support
The Mission Control client is now built to run optimally on Eclipse 2022-06 and later. To install JDK Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

eclipse[1]

Minor bugfixes and improvements
There are 51 fixes and improvements in this release. Check out the JMC 8.3 Result Dashboard (https://bugs.openjdk.org/secure/Dashboard.jspa?selectPageId=21205) for more information.

Core

Parser improvements
The performance of the FastAccessNumberMap has been improved for sparse values.

Java Flight Recorder (JFR)

Dependency View
There is a new view for visualizing call dependencies. There are two modes of operation in the view, chord diagram and edge bundling. In the edge bundling visualization, hover over packages to see dependencies highlighted in colors: green means that methods in the linked package is called by methods in the package being hovered over, yellow means that methods in the linked package are calling mathods in the package being hovered over. Red means that methods in the packages are calling each other. To show the Dependency View, go to Window | Show View | Other… and select the Dependency View under the Mission Control folder.

Graph Pruning
The graph view in JMC can now be pruned to focus on the most impactful nodes. Select the target number of nodes, and the visualization will show at most that many number of nodes.

Selectable Attribute
It is now possible to select which attribute to use for the weights in the trace view and the flame graph view.

Parser improvement
The parser now supports parsing events with char fields.

Bug Fixes

Area: General
Issue: 7813
Synopsis: Unable to open Help page in macOS M1 when JMC started with JDK11

The help page was inaccessible, throwing an error on macOS M1 when JMC is run on JDK 11.0.16. This is now fixed.

Area: General
Issue: 7321
Synopsis: Unable to view JMC Help Contents (HTTP ERROR 500 ) when booted with JDK 17 or higher

The help page was inaccessible, throwing an error, when running on JDK 17+. This is now fixed.

Area: JFR
Issue: 7812
Synopsis: Unable to open links from Automated Result analysis page

Links in the results of the automated analysis results would not open properly on Linux and Mac OS X. Now they do open in situ.

Known Issues

Area: General
Issue: 4270
Synopsis: Hibernation and time

After the bugfix of https://bugs.openjdk.java.net/browse/JDK-6523160 in JDK 8, the RuntimeMXBean#getUptime() attribute was re-implemented to mean “Elapsed time of JVM process”, whilst it previously was implemented as time since start of the JVM process. The uptime attribute is used by JMC, together with RuntimeMXBean#getStartTime(), to estimate the actual server time. This means that time stamps, as well as remaining time for a flight recording, can be wrong for processes on machines that have been hibernated.

Area: General
Issue: 7953
Synopsis: Unable to install JMC Plugins on Eclipse 4.25

Because of updates to the naming of certain platform dependencies in Eclipse 4.25 (e.g. JUnit 5 bundles now have name of the form junit-* instead of Orbit variants org.junit.*), it will no longer be possible to install the plug-in version of JMC into Eclipse 4.25+. This will be resolved in a later version of JMC. A workaround for now is to build JMC from the mainline (9.0 EA), and installing the plug-in version from the resulting update site archive.

Area: JFR
Issue: 7947
Synopsis: JMC crashes while performing flight recording on MacOS 13.0_x64

JMC can crash when completing a recording on MacOS 13.0 on x64. It seems to be related to running JavaScript in the Browser component. Eclipse is investigating the issue here.

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7003
Synopsis: The graph, flame graph view, heatmap view and dependency view does not work on Windows

This is due to a problem with the Windows based browser component in SWT. We’re hoping for a fix in the component for a future version of the Eclipse platform.

TL;DR

There is a new version of JMC out. Have fun! Smile

JDK Mission Control 8.2.0 Released!

2022-03-20 - By Marcus

The latest release of JDK Mission Control was just released! Since this is the source release, it may still take a bit of time until the downstream vendors release binary builds of JDK Mission Control 8.2.0. I will try to remember to tweet or say something on the JMC Facebook page once the binaries start showing up.

Here’s what’s new:

Mission Control 8.2 – New and Noteworthy

General

JMC 8.2 – New Release!
This is a new minor release of JDK Mission Control. The JMC application requires JDK 11+ to run, but can still be used to connect to, and parse JFR recordings from, OpenJDK 8u272+ and Oracle JDK 7u40+. It can also still open and visualize flight recordings from JDK 7 and 8.

Eclipse 4.22 support
The Mission Control client is now built to run optimally on Eclipse 2021-06 and later. To install JDK Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

Minor bugfixes and improvements
There are 83 fixes and improvements in this release. Check out the JMC 8.2 Result Dashboard (https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20804) for more information.

Binary build for Apple ARM
JDK Mission Control is now built for Apple ARM, allowing JMC to be run natively (without Rosetta x86 emulation) on Apple M1.

Core

Parser support for async profiler
Parser support has been added for frame types generated by async profiler, such as Native, C++ and Kernel.

System.gc() rule
There is now a new rule for explicit invocations of System.gc().

Java Flight Recorder (JFR)

Heat map view
A new heat map view has been added, which is handy for seeing when events are taking place. Use Window | Show View | Other…, and select the Heatmap View under Mission Control and click Open to open the view.

Websocket for selections
There is a new websocket API available that pushes stack trace data from selections in the JFR UI as JSON on a user defined port. This allows for programmatic control of the visualization directly in the browser. Tools like observablehq.com can be used to invent new visualizations, or to alter the visualization. To get started, simply go to the Flight Recorder preferences in JMC, and select the Websocket port to use (0 to disable). A set of example visualizations are available here: https://observablehq.com/collection/@cimi/java-mission-control.

Bug Fixes

Area: JFR
Issue: 7403
Synopsis: JFR parser struct types hashcode problem

Some JFR parser struct types were using lazily initialized attributes which happen to be a part of hashCode/equals computations.

Area: JFR
Issue: 7532
Synopsis: Delays in rendering of JMX console graphs

Sometimes the updates of the JMX console graphs would be severely delayed on MacOS. This is now fixed.

Area: JFR
Issue: 7068
Synopsis: JfrRecordingTest (uitest) hangs on the automated analysis page

Trying to run uitests on Fedora hangs on JfrRecordingTest. This was fixed after the Eclipse platform update.

Known Issues

Area: General
Issue: 4270
Synopsis: Hibernation and time

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7003
Synopsis: The graph and flame graph view does not work on Windows

This is due to a problem with the Windows based browser component in SWT. We’re hoping for a fix in the component for a future version of the Eclipse platform.

Contributing to OpenJDK Mission Control

2021-10-14 - By Marcus

Since this month is Hacktoberfest, I thought it would be a good idea to talk a bit about how to contribute to the OpenJDK Mission Control project. Some of the content of this blog post will be applicable to any of the OpenJDK projects, especially the Skara (OpenJDK on Git) bits.

The OpenJDK Mission Control Project

The OpenJDK Mission Control project is the observability tools suite for OpenJDK. It contains a JMX Console, a JFR visualizer and analyzer, a heap waste analysis tool, and many other little useful tools and utilities. Since it is all open source, pretty much anyone can contribute to the project.

The project is on GitHub:
https://github.com/openjdk/jmc

The first step to contribute to JDK Mission Control is to simply fork the repository on GitHub. This establishes a copy of the repository where you can freely make changes as you please. Whilst it is technically possible to make the changes in the master branch, it will save time and effort if you later want to contribute the effort to make the changes in a branch:
git checkout -b my-jmc-test

Building JMC

First of all, ensure that you have jdk11 active in your shell, and verify that this is the case using:
java -version

There are multiple ways to build JMC. The easiest way is to simply use the build script (don’t do this just yet):
./build.sh –packageJmc

There is also a way to build JMC using Docker (don’t do this just yet either):
docker-compose -f docker/docker-compose.yml run jmc

These are however not the best ways when you’re developing JMC using an IDE. The third party dependencies for JMC need to be available through a p2 repository, and you want to install a build of the JMC core libraries into your maven cache.

So, to set things properly up for development, it is better to first install the core libraries:

cd $JMC_ROOT/core
mvn install

Next, build the p2 site and start jetty to expose it on a well known port:

cd $JMC_ROOT/releng/third-party
mvn p2:site
mvn jetty:run

Then leave jetty running for as long as you are developing JMC. You will need it up and running so that it can be found both when building from the command line, as well as when compiling JMC from within the Eclipse development environment.

To build the JMC application, next do the following in a separate shell (since you have jetty with the p2 site for the third-party dependencies up and running in the previous one):

cd $JMC_ROOT
mvn package

After this, you can use the build script to run the built JMC product:
./build.sh –run

For alternative ways of launching JMC, see the platform specific documentation in the README.md.

Developing JMC

Many that I’ve talked to, especially when JMC was shipped with the Oracle JDK, believed that JMC is a native application. If you’ve browsed the repo, you’ve already seen that it is a Java application, more specifically an Eclipse RCP application. Since it is an Eclipse RCP application, it’s easiest to develop JMC using Eclipse.

First set up your development environment, following the Developer Guide. It is slightly involved, but luckily does not need to happen very often.

Next, in your branch in your fork, commit the changes you want to contribute, and create a pull request, just like you would for any other open source project on GitHub.

Now, if this is your first OpenJDK PR, the OpenJDK bot will likely complain about a few different things, for example:

You need to have your GitHub account associated with a company that has a signed Oracle Contributor Agreement (OCA), or you must have signed an OCA yourself.
The PR needs to have an associated issue in the Java Bug System.
There is some problem with the testing or formatting of your code.

Let’s take a quick look at these three problems.

The Oracle Contributor Agreement

Like all open source projects, there needs to be a Contributor Agreement in place. This is to protect everyone backing the project, as well as the customers depending on the project. For example, the contributor agreement ensures that the source code you’re contributing isn’t violating any patent rights, and that the source code you’re contributing is yours to contribute.

Many larger companies already have an OCA signed, so the first step might be to check with your company if one is already signed. In my case, I both have a personal OCA signed (since I was contributing before Datadog signed an OCA), and one signed by my employer, Datadog.

You will know that the OCA status is not properly set up for your GitHub account when the OCA label is set in the PR, and the following text can be found in the PR:
⚠️ OCA signatory status must be verified

The OpenJDK bot will write helpful messages in the PR to help guide you through getting your OCA status verified.

The Java Bug System

Once you have a few commits under your belt, and become an OpenJDK author, you have access to the Java Bug System (JBS): https://bugs.openjdk.java.net/. So, what do you do before then? If the PR passes a first cursory check by the reviewers, a reviewer will simply create an Issue in JBS for you.

Fixing Issues

If you end up having an issue, the details of the test run in the PR will hopefully be enough to sort it out. If not, you can run mvn verify locally and look at the test logs. If it is formatting, then check if the formatting problem was in core or not, and either run mvn spotless:apply in core or in the root of the project.

Skara – the OpenJDK Git Tooling

Skara is the project name for the tooling around developing OpenJDK on Git(Hub). It actually insulates a lot of the GitHub specifics, making it possible, should the need ever arise, to move the development and development process somewhere else. The project also contains the aforementioned bot that helps, for example, to verify that there is a related JBS issue, and that there is a signed OCA. Skara also contains some useful git extensions which make working with OpenJDK on GitHub smoother.

To set things up, do the following:

Clone Skara:
git clone https://github.com/openjdk/skara

Build it:
gradlew (win) or sh gradlew (mac/linux)

Install it:
git config –global include.path “%CD%/skara.gitconfig” (win), or
git config –global include.path “$PWD/skara.gitconfig” (mac/linux)

Set where to sync your forks from:
git config –global sync.from upstream

Here are some examples:

To sync your fork with upstream and pull the changes:
git sync –pull

Note: if the sync fails with the error message “No remote provided to fetch from, please set the –from flag” or “error: upstream is not a known git remote, nor a proper git URI”, remember to set the remote for your repo, e.g.

git remote add upstream https://github.com/openjdk/jmc

To list the open PRs:
git pr list

To create a PR:
git pr create

To push your committed changes in your branch to your fork, creating the remote branch:
git publish

So, the normal workflow when working with OpenJDK JMC using the Skara tooling becomes:

Note: First ensure that you have a fork of JMC, and that your current directory is the root of that fork. You typically just create that one fork and stick with it.

(Optional) Sync up your fork with upstream:
git sync –pull
Create a branch to work on, with a name you pick, typically related to the work you plan on doing:
git checkout –b <branchname>
Make your changes / fix your bug / add amazing stuff
(Optional) Run jcheck locally:
git jcheck local
Push your changes to the new branch on your fork:
git publish (which is pretty much git push –set-upstream origin <branchname>)
Create the PR, either on GitHub, or from the command line:
git pr create

Once the PR is created, the bot will check that everything is okay, and the PR will be reviewed.

Interacting with the Skara Bot

Getting the PR merged is handled a bit differently in OpenJDK compared to normal GitHub projects. First of all, all the prerequisites must first be fulfilled, like the OCA status of the contributor being verified, the change being properly reviewed, jcheck passing, the tests passing, the PR having a matching issue in JBS etc. Once that is all taken care of, the bot will helpfully ask, in a message in the PR, for the author of the PR to integrate the changes. This is simply done by typing /integrate in message in the PR. The bot will automatically rebase on the latest changes in the target branch (normally master) and squash your commits. In other words, it is perfectly fine to have multiple fixes and other commits happening in the PR after the initial commit for the PR. It is actually much preferred to force-updating the PR, as it’s easier to follow along with the review.

If the PR author is not a committer on the project, the bot will inform that the PR is ready to be sponsored by a committer, which is normally done by the reviewer of the PR. This is done by writing /sponsor in a separate message in the PR.

When the PR is merged, the corresponding JBS issue is automatically closed.

Other Related Repos

There are a few additional repos that are related to the OpenJDK JMC project, but that aren’t currently OpenJDK projects. Two examples are the jmc-jshell and the jmc-tutorial repositories. The jmc-tutorial is a good resource for learning about JDK Mission Control. Even though it is not officially an OpenJDK repository, it can still be a good place to start contributing to the OpenJDK JMC community.

Summary

Contributing to OpenJDK is easier than ever before now that it’s on GitHub.
Skara makes it even easier.
It’s Hacktoberfest – commits to the JMC project (and related repos) count!
JBS is a good source for JMC starter bugs.
If you need any help, the JDK Mission Control slack is a good place for asking questions! Ping me or any of the JMC folks for an invite. 🙂
Finally, here’s a practical guide to OpenJDK projects and the roles:
OpenJDK Projects (java.net)

JDK Mission Control 8.1.0 Released!

2021-08-04 - By Marcus

Yay, the latest release of JDK Mission Control was just released! Since this is the source release, it may still take a bit of time until the downstream vendors release binary builds of JDK Mission Control 8.1.0. I will try to remember to tweet or say something on the JMC Facebook page once the binaries start showing up.

Mission Control 8.1 – New and Noteworthy

General

JMC 8.1 – New Release!
This is a new minor release of Java Mission Control. The JMC application will now require JDK 11+ to run, but can still be used with OpenJDK 8u272+ and Oracle JDK 7u40+. It can also still open and visualize flight recordings from JDK 7 and 8.

Eclipse 4.19 support
The Mission Control client is now built to run optimally on Eclipse 2021-03 and later. To install Java Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

Minor bugfixes and improvements
There are more than 80 fixes and improvements in this release. Check out the JMC 8.1 Result Dashboard (https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20404) for more information.

Core

New Serializers Core Bundle
There is now a new core bundle making it easy to serialize flight recording data to DOT (Graphviz) and JSon. This bundle will be expanded upon in future versions.

Improved JFR parser performance
The performance of the JFR parser has been improved. More improvements are coming in 8.2.

Java Flight Recorder (JFR)

Support for the new JDK 16 Allocation Events
A new form of light weight allocation profiling was introduced with JDK 16 (see https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8257602). This version of JMC supports this new type of allocation profiling.

New Page for Peeking into the Constant Pools
There is a new page available for taking a look at what constants are available in the recording. This can, for example, be useful when creating custom events to see where all that storage and memory is being used.

Open Recordings with .lz4 extension
For convenience, files with the .lz4 extension will now be attempted to be opened as flight recordings. This is since lz4 is a common compression to use with flight recordings.

JMC Agent Plug-in

New JMC Agent Plug-in
There is now a new agent plug-in available for JMC, which allows configuring where to emit flight recording events in an already running process.

Bug Fixes

Area: JFR
Issue: 6939
Synopsis: Time range indicator update problem fixed

Sometimes the time range indicator wasn’t updated when setting the time range. This is now fixed.

Area: JFR
Issue: 7007
Synopsis: Unable to edit run configurations for eclipse project after installing JMC plugin fixed

Previously it would not be possible to edit run configuration after installing the experimental JMC launcher plug-in. This has now been resolve.

Known Issues

Area: General
Issue: 4270
Synopsis: Hibernation and time

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7068
Synopsis: JfrRecordingTest (uitest) hangs on the automated analysis page

Trying to run uitests on Fedora hangs on JfrRecordingTest.

Area: JFR
Issue: 7003
Synopsis: The graph and flame graph view does not work on Windows

This is due to a bug with the Edge based browser component in SWT. We’ll look into it for 8.2.0.

Area: JFR
Issue: 6265
Synopsis: JMC crashes with Webkit2+GTK 4

See the issue for more information.

Area: JFR
Issue: 5412
Synopsis: Dragging and dropping a JFR file into an open analysis page does not work

The expected behaviour would be to open the recording whenever a file is dropped in the editor area, but the behaviour will be defined by the embedded browser component, and not very useful.

JMC Core Now on Maven Central!

2021-08-03 - By Marcus

Good news! The JDK Mission Control core library bundles are now available on Maven Central, making it easier than ever to use things like the the JMC JDK Flight Recorder parser to transparently parse and extract information from flight recordings ranging from Oracle JDK 7 and up to the very latest versions of OpenJDK.

I have updated my JShell example so that you can see an example of them being used.

For just using the parser, you will typically need the common and flightrecorder bundles. If you also need the rules engine, you add flightrecorder.rules, and if you want the base set of heuristics for the jdk, also add flightrecorder.rules.jdk.

For example:

	<properties>
		<jmc.version>8.0.1</jmc.version>
	</properties>
	<dependencies>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>common</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder.rules</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder.rules.jdk</artifactid>
			<version>${jmc.version}</version>
		</dependency>
	</dependencies>

Finally! 🙂

OpenJDK and the Future of Production Profiling

2021-05-31 - By Marcus

Some thoughts on the future of continuous production profiling on the OpenJDK platform.

A long time ago, the JRockit Runtime Analyzer (JRA) was introduced into the JRockit JVM as a means of being able to figure out what was going on in the JVM. It was mainly there to find out how customers were using the JVM, so that the JVM could be optimized for actual real-world production work-loads. The JRA output the data as XML, since customers insisted on the data being human readable so that they could see exactly what they would be sending us. Later LAT (LATency Analyzer) was introduced, since after the introduction of the JRockit low latency garbage collector (a.k.a the DetGC), some customers complained about the GC not keeping its latency promises. More often than not, it turned out it was other kinds of thread stalls causing the latencies, so LAT was introduced so that the JRockit team could figure out where the problematic code was. Since there could be a considerable amount of data in LAT, a binary format was introduced for the events recorded.

Eventually the production profiling (JRA + LAT) had a model overhaul and became the JRockit Flight Recorder. The data format was binary, self describing, extensible and efficient. After Oracle acquired Sun Microsystems, and the Hotspot and JRockit JVM teams merged, it became the Java Flight Recorder (JFR), and in 2018 it was open sourced as the JDK Flight Recorder (still JFR), since calling anything related to Java something with “Java” in the name can be complicated.

The JDK Flight Recorder design philosophy is to be the one-stop-shop production profiler for OpenJDK. JFR needs to be able to do various kinds of profiling, all at the same time, at a low overhead. It also needs to be able to run continuously for as long as someone is interested in the data. Potentially always.

Now, with changes in the Java (and the computing) ecosystem, JFR has some loom-ing challenges to remain relevant for the future.

JFR has a lot of nice properties:

It does multiple kinds of profiling, normally at a low overhead (space and CPU).
(Profiling types include allocation profiling, almost CPU profiling, latency outlier profiling and much more.)
It doesn’t suffer from the same constraints as Java-only based profilers.
(Weaknesses usually include things like safe point bias, safepointing VM operations, allocation pressure, undoing scalarization optimizations, lack of STW/safepoint visibility.)
It provides context helpful to solving problems.
(For example, a monitor enter event has the monitor class, the monitor address, the thread holding the monitor, the thread and stack trace for the blocking call and more.)
It is low overhead, (mostly) designed for predictable data rates and overhead.
It is extensible – you can add your own profilers and data.
(Datadog, for example, has its own rather useful exception profiler which is publishing data into JFR.)
As long as all your data is produced in JFR, all the common constants in the runtime will be recorded into the same constant pool.
(For example, if you record some data using JFR and some data with a custom agent, common class names, method names etc will need to be repeated in both of the serialization formats.)

There are however a few ways in which JFR can, and probably has to, improve, to remain the ultimate production profiling platform for Java:

Data rate (and thereby overhead) needs to be more deterministic.
CPU profiling needs to be improved.
Wall clock profiling needs to be possible.
It needs to be possible to easily and cheaply associate context with events.

Let’s take these in reverse order.

Recording Context

A very common problem is to try to associate some contextual information with the data produced by JFR. For example, in the good old Oracle days, there were events produced by WebLogic Server (WLS) containing what was called the Execution Context IDs (ECIDs). Since JFR events are thread local, it was possible to use them to find out what other, lower level, information was captured during that time, for example where the allocation pressure was, or why a call to a logger leading to a blocking call trying to enter a contended monitor happened. The instrumented code, and thereby the events, were usually emitted where something took a bit of time, such as database IO and similar, so the overhead was not bad.

Fast-forward to today. We now have distributed tracers doing pretty much what WLS was doing back in the day. We now have a trace, which is a directed acyclic graph of spans, which we track. The difference being that with the microservices and async frameworks that are commonly in use these days, these spans can now last a very short time. The tracers do keep track of when the context is propagated to other threads; in other words, the tracer can know where work is carried out on specific threads for a specific trace/span. In the end, the tracer doesn’t really care though – it only keeps track of this to know when a span can be closed. No matter how many related little work items were scheduled on various threads, the tracer will simply note when the span was started, and when the span was closed. For the profiler, however, the information about which thread did work related to a specific span is crucial to be able to figure out what traces were captured in the context of what spans/traces. Since a thread can switch between doing work for different spans at an incredible rate, the overhead of using additional events to keep track of them can be quite costly. Especially if the switches happen way more often than a sample is being captured.

This will only be compounded once Loom becomes part of the Java platform, and we can have millions of Virtual Threads (fibers, p-threads, MxN threads, green threads, whatever you want to call them) running.

A long time ago, the JFR team was discussing something we called thread coloring – the ability to tell a thread what the current context is. This is exactly what Go does today with goroutines. In Go it is called profiling labels. Go pre-aggregates the CPU profiling data, but allows the settings of labels to group the profiling data into buckets. Sadly, in Go it is currently limited to CPU-profiling only.

One solution to the context problem could be to allow setting the context for a thread – conceptually a Map<String, long>, and to provide some new settings for an event, for example record-context, to decide if the context should be recorded into the event or if we don’t care, and record-only-on-context, to decide to record an event only if there is some context present.

Having such a capability would not only make it possible to associate events with a context even where such a context changes rapidly. It could also be used to only record certain events when they occur in a context for which we care, possibly bringing down the overhead of capturing data and stack-traces, and saving memory in the buffers for the things we find to be the most relevant. A concrete example would be to only record certain information if a thread is currently associated with a trace which will be sampled.

See https://bugs.openjdk.java.net/browse/JDK-8264516.

Wall-Clock Profiling Must Be Possible

JFR has an interesting mix of sampling profiling, thresholded execution tracing and metrics. For example, it has a sampling execution profiler, a rate-limited sampling allocation profiler etc. It also has events to locate thread latency outliers. These events are not sampled, but give you exact information about a thread halt lasting longer than a configurable threshold.

Of course, one problem with thresholding is that you can theoretically have a lot of thread latencies just below the threshold. It is easy to construct an artificial benchmark where the data indicates that you have no thread latencies whilst having nothing but thread latencies.

One way to get unbiased information, at a reasonable cost (i.e. not setting the threshold to 0 for the latency outlier events), is to introduce a wall clock profiler. A wall clock profiler basically periodically dumps the thread stack for a thread, no matter what it is up to. A limitation is that the wall clock profiler has no idea of what the thread was actually up to – we can’t get all the juicy, custom, actionable, information at that point – we usually only get the thread dump.

Also, given that you might have millions of (virtual) threads, you probably don’t want to sample them all at any given time, and you certainly don’t want to stop the world whilst doing the sampling. As a matter of fact, for most applications, looking at wall clock profiling information from all threads will be a very boring exercise – most of them will be halted waiting for something to do, for example in thread pools, or waiting in some I/O. For example, for your average Java recording, you will have plenty of threads in thread pools, being parked. To see that you have thousands of threads waiting on a park call in a thread pool will not help you resolve a great many problems. If you, on the other hand, have a really long park causing a trace to take a really long time to complete, then you’ll likely be interested in what is going on.

This means that different profilers probably want these wall-clock samples to be picked differently. One way to accomplish this would be to allow users of JFR to commit events on a separate thread.

Event#commit(Thread)

This would allow someone to build a profiler by simply periodically committing events on the threads it currently cares about. It would also allow building other kinds of profilers and sampling behaviours.

Another way of accomplishing this would be to add an annotation to override default stuff captured by JFR in an event.

class MyEvent extends Event {

@overridejfr
Thread overrideThread;
}

An advantage with this variant is that you could potentially add more override behaviours that are handled by JFR over time.

A third variant might be to introduce a configurable global wall clock event, perhaps one that picks one thread at a time in a round robin fashion, at a certain frequency. Together with the context feature mentioned above, it would be possible to only emit events where there is useful context available.

Yet another variant may be to add a native function (extern “C”) to emit an event. Something similar to AsyncGetCallTrace (probably implemented using AsyncGetCallTrace), but which emits an event. If we have the context feature described above, it could be used to decide where events are emitted. If different kinds of profiling will be implemented, you can simply provide context information about the kind of profiling being done to avoid the profilers interfering with each other.

There is JDK-8237206 for this, but there isn’t much information recorded in the issue.

CPU Profiling Improved

The current way of doing CPU profiling with JFR is to use the Execution Sample Event in JFR. It gives you a good idea of where the JVM is spending the most CPU time executing Java code. It is cheap, has low overhead, and satisfies the needs of many users.

That said, there are some problems. One is that it does not cover the full Java process. If there is a native library running native threads, the CPU time spent there won’t be accounted for. Also, some other native and intrinsified code, for example the JVM native threads, will not be covered. It is easy to write a benchmark where JFR will capture almost no samples at all.

To get a better understanding of where the CPU is being spent, we need to build a profiler actually sampling on CPU time, and that can, at least optionally, capture the full thread stacks including native frames. One approach to this is to do what async profiler does – use kernel profiling data and line it up with information from AsyncGetCallTrace.

At Datadog we’ve built a pretty awesome CPU profiler for Linux, based around perf_event_open – one that does really well, and which handles symbol resolution in containerized environments better than most. Sadly, we don’t have it for Windows and Mac, so, if we contribute it to OpenJDK, we would need help implementing it on other platforms.

There is JDK-8234854, but I haven’t added much context yet.

Deterministic Overhead and Memory Use

One problem that JFR used to have, before JDK 16, was that allocation profiling could get very expensive. Especially for allocation heavy applications, running on plenty of CPUs. It could get very expensive both in terms of overhead but also in terms of size. For example, running a well parallelized, allocation heavy, application on a 96 core machine could produce millions of events per minute, resulting in recording sizes of hundreds of megabytes per minute.

In JDK 16 Datadog submitted a new kind of allocation profiler, which was rate limited. It drew inspiration from PID controllers (yes, I still love robotics, even though there is literally no time for it these days) to let the user specify a budget of the maximum number of events per second, and then the controller will attempt to keep the number of events emitted to that budget, subsampling if necessary.

We had already used the same technique successfully for our exception profiler, since it turns out that some people use not only exceptions, but even subclasses of Error (!), for flow control.

Now, it may be useful to use the same kind, or some similar kind, of safety valve to limit both the overhead and the data production rate for other events that are prone to the same edge cases. For example, the thread latency events today are currently thresholded. This means that you perhaps, given the number of threads, might be able to estimate the maximum number of events you could produce. But it’s not very helpful, since you might be running very few, or, in the future, with Loom, millions of them.

Instead you might want to subsample them, at a max rate that you can set yourself, per event type. You could even allow for thresholding and rate limited sampling at the same time, accepting a certain bias, e.g. if you want to focus on longer lasting thread halts.

Summary

TL;DR, JFR has some interesting challenges ahead. Some of them will be compounded by the introduction of Virtual Threads. For JFR to remain the premium, best in class, production profiling platform, some investment will be needed. Datadog will try to help, but some of these problems will require updates to the JFR file format and updating the API. The OpenJDK community will need to be involved.