JDK Mission Control 9.0.0 Released!

The 9.0.0 GA release of JDK Mission Control was just tagged in the JMC repo at GitHub! Since this is the source release, it may still take a bit of time until the downstream vendors release binary builds of JDK Mission Control 9.0.0. I will try to remember to tweet or say something on the JMC Facebook page once the binaries start showing up.


Mission Control 9.0 – New and Noteworthy


General


JMC 9 – New Release!
This is the latest (2024) major release of JDK Mission Control. JMC 9 requires JDK 17+ to run and introduces several new features, enhancements, and bug fixes. This version continues to support connecting to, and parsing JFR recordings from, OpenJDK 8u272+ and Oracle JDK 7u40+, and can open and visualize flight recordings from JDK 7 and 8. JDK Mission Control is available for Windows (x86_64), Mac OS X (ARM and x86_64), and Linux (ARM and x86_64).

jmc


Eclipse 4.30 support
The Mission Control client is now built to run optimally on Eclipse 2023-12 and later. To install JDK Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

eclipse


Support for Linux/aarch64
JMC 9 is now built for Linux aarch64.

linuxaarch64


Support for dark mode
JMC 9 now supports dark mode. Go to Preferences, General | Appearance, and select the Dark theme to enable.

darkmode


Minor bugfixes and improvements
There are 118 fixes and improvements in this release. Check out the JMC 9.0 Result Dashboard for more information.

issues


Add user configuration for local JVM refresh interval
Previously the JVM Browser checked every 5000 ms for new JVMs. This can now be configured.

refreshinterval


Core


Better JFR parser performance
Multiple efforts have been made to reduce allocations in the JMC parser, including: reduced allocation of Doubles, reduced allocation rate in ParserStats. Also, when duration events aren’t ordered by their end time (e.g. events which stack so that the last event finishes first, or file reads with overlaps) `DisjointBuilder.add` can be slow because of the linear search for the lane, and then a linear time reordering. This has been improved with a binary search.

jfrperformance


Support checkpoint event sizes beyond u4 limit
The JMC JFR parser now support checkpoint event sizes beyond the u4 limit.

checkpointsize


Move non-Eclipse dependent classes from org.openjdk.jmc.ui.common to org.openjdk.jmc.common
There were a number of classes previously in jmc.ui.common that would be a great asset to the core distribution (and the third-party applications that consume jmc-core), and these classes now live in jmc.common. Please see JMC-7308 for further information.

reorganize


Move rjmx bundle from application to core
The rjmx classes and related services (FlightRecorderService) are now exposed for third-party application usage. Please see JMC-7069 for further information.

reorganize


Move org.openjdk.jmc.flightrecorder.configuration bundle from application to core
The org.openjdk.jmc.flightrecorder.configuration bundle contains many classes useful for working with jfr, and are now available in core. Please see JMC-7307 for further information.

reorganize


Java Flight Recorder (JFR)


The Event Browser now supports searching and showing event type ids
Searching in the search bar now also searches event type IDs, and there is also a (by default hidden) column that makes it easy to show the event type IDs for the shown events.

eventtypeid


Add support for enabling jfr on native images
Previously JMC was unable to start flightrecorder on a graalvm native image, even if there is built-in jfr support. This has now been fixed.

native-image-alt


Java based flamegraph visualization
The previous flamegraph visualization takes place in an embedded browser component (provided by the Eclipse platform), unfortunately this approach has some drawbacks, the first being a bit slow. This view is now using a Java (Swing) based flamegraph library. Also, the flame graph model creation performance have been improved.

flamegraph


Visualization and Rule for FileChannel.force()
The File I/O page has been updated to show force related information. There are two new columns added – Force Count and Update Metadata. Both are hidden by default and can be enabled by right clicking the table. The chart will also include a File Force row. There is a preference setting for the associated file force rule, where the peak duration warning limit can be set. See JMC PR#533 for more information.

fileforce


Rule that checks on G1 pause time target compliance
New rule that looks at the pause time target and compares it to the actual pauses.

rule


Rule that looks at finalization statistics
JDK 18 comes with a FinalizationStatistics event that helps users find where in their application finalizers are run. This is important as finalization has been deprecated for removal in a future release. For more information about finalization and its flaws, see https://openjdk.java.net/jeps/421. Even if an application doesn’t implement any finalize() methods, it may rely on third-party libraries that does. Static analysis of third-party libraries using “jdeprscan –for-removal” can be used to list those classes, but it will not tell if they are being used. For example, an application may be missing a call to a close() method, so the resource is cleaned up by the finalizer, which is sub-optimal.

rule


Rule that detects GC Inverted Parallelism
Rule inspired by the “Inverted Parallelism” analysis in Garbagecat. See JMC-8144 for more information.

rule


Support for the new JPLIS agent events
There is now a new page and rule for loaded JPLIS agents. See JMC-8054 for more information.

agent


Twitter plug-in removed
Due to changes in APIs and cost of maintenance, the Twitter plug-in has been removed.

twitterplugin


Bug Fixes


Area: Agent
Issue: 8045
Synopsis: retransformClasses() doesn’t re-transform all needed classes

The retransformClasses() methods in Agent and AgentController use Class.forName() to try to get the class objects of classes needed to re-transform. This obviously doesn’t work for classes loaded by classloaders different from the one which loads the agent. Those classes would be instrumented if they were loaded after their event probes were defined the AgentController. But when loaded earlier they would not be instrumented. This has been fixed.

Area: Agent
Issue: 8048
Synopsis: Agent throws exceptions on missing or empty descriptions

When the description of an event or value is empty or missing, the agent fails with exceptions. This has now been fixed.

Area: Console
Issue: 8154
Synopsis: Some JMX attributes are missing unit specifications in the Console

The missing unit specifications have now been added.

Area: Core
Issue: 8063
Synopsis: IMCFrame Type cache not synchronized

The type cache used in the IMCFrame Type inner class wasn’t synchronized and could cause a concurrent modification exception during e.g. JFR parsing. This has been fixed.

Area: Core
Issue: 8156
Synopsis: JfrRulesReport.printReport does not respect verbosity for text and json

The verbosity flag for text and json reports didn’t work. This has been fixed.

Area: Core
Issue: 8041
Synopsis: JfrRulesReport json reports produce incomplete results

While generating JFR Rules Reports in json format, the results were incomplete. The components “message” and “detailedMessage” were not populated. This has been fixed.

Area: JFR
Issue: 7885
Synopsis: Graphical rendering of dependency view fails due to heap memory drain

Also JMC-7496. The dependency view drains the heap memory and causes out-of-memory exceptions and performance delays. This has been improved.


Known Issues


Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7003
Synopsis: The graph view, heatmap view and dependency view does not work on Windows

This is due to a problem with the Windows based browser component in SWT. We’re hoping for a fix in the component for a future version of the Eclipse platform.

JDK Mission Control 8.3.0 Released!

The latest release of JDK Mission Control was recently released! Since I am a bit late with this blog, there are already some binary releases available, for example:

Eclipse Mission Control:
https://adoptium.net/jmc/

Zulu Mission Control:
https://www.azul.com/products/components/azul-mission-control/

 

Mission Control 8.3 – New and Noteworthy


General


JMC 8.3 – New Release!
This is a new minor release of JDK Mission Control. The JMC application requires JDK 11+ to run, but can still be used to connect to, and parse JFR recordings from, OpenJDK 8u272+ and Oracle JDK 7u40+. It can also still open and visualize flight recordings from JDK 7 and 8. JDK Mission Control is built for Windows (x86_64), Mac OS X (ARM and x86_64), as well as Linux (x86_64).

jmc[1]


Eclipse 4.24 support
The Mission Control client is now built to run optimally on Eclipse 2022-06 and later. To install JDK Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

eclipse[1]


Minor bugfixes and improvements
There are 51 fixes and improvements in this release. Check out the JMC 8.3 Result Dashboard (https://bugs.openjdk.org/secure/Dashboard.jspa?selectPageId=21205) for more information.

issues


Core


Parser improvements
The performance of the FastAccessNumberMap has been improved for sparse values.

noimage[1]


Java Flight Recorder (JFR)


Dependency View
There is a new view for visualizing call dependencies. There are two modes of operation in the view, chord diagram and edge bundling. In the edge bundling visualization, hover over packages to see dependencies highlighted in colors: green means that methods in the linked package is called by methods in the package being hovered over, yellow means that methods in the linked package are calling mathods in the package being hovered over. Red means that methods in the packages are calling each other. To show the Dependency View, go to Window | Show View | Other… and select the Dependency View under the Mission Control folder.

dependency


Graph Pruning
The graph view in JMC can now be pruned to focus on the most impactful nodes. Select the target number of nodes, and the visualization will show at most that many number of nodes.

graphpruning


Selectable Attribute
It is now possible to select which attribute to use for the weights in the trace view and the flame graph view.

attribute


Parser improvement
The parser now supports parsing events with char fields.


Bug Fixes


Area: General
Issue: 7813
Synopsis: Unable to open Help page in macOS M1 when JMC started with JDK11

The help page was inaccessible, throwing an error on macOS M1 when JMC is run on JDK 11.0.16. This is now fixed.

Area: General
Issue: 7321
Synopsis: Unable to view JMC Help Contents (HTTP ERROR 500 ) when booted with JDK 17 or higher

The help page was inaccessible, throwing an error, when running on JDK 17+. This is now fixed.

Area: JFR
Issue: 7812
Synopsis: Unable to open links from Automated Result analysis page

Links in the results of the automated analysis results would not open properly on Linux and Mac OS X. Now they do open in situ.


Known Issues


Area: General
Issue: 4270
Synopsis: Hibernation and time

After the bugfix of https://bugs.openjdk.java.net/browse/JDK-6523160 in JDK 8, the RuntimeMXBean#getUptime() attribute was re-implemented to mean “Elapsed time of JVM process”, whilst it previously was implemented as time since start of the JVM process. The uptime attribute is used by JMC, together with RuntimeMXBean#getStartTime(), to estimate the actual server time. This means that time stamps, as well as remaining time for a flight recording, can be wrong for processes on machines that have been hibernated.

Area: General
Issue: 7953
Synopsis: Unable to install JMC Plugins on Eclipse 4.25

Because of updates to the naming of certain platform dependencies in Eclipse 4.25 (e.g. JUnit 5 bundles now have name of the form junit-* instead of Orbit variants org.junit.*), it will no longer be possible to install the plug-in version of JMC into Eclipse 4.25+. This will be resolved in a later version of JMC. A workaround for now is to build JMC from the mainline (9.0 EA), and installing the plug-in version from the resulting update site archive.

Area: JFR
Issue: 7947
Synopsis: JMC crashes while performing flight recording on MacOS 13.0_x64

JMC can crash when completing a recording on MacOS 13.0 on x64. It seems to be related to running JavaScript in the Browser component. Eclipse is investigating the issue here.

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7003
Synopsis: The graph, flame graph view, heatmap view and dependency view does not work on Windows

This is due to a problem with the Windows based browser component in SWT. We’re hoping for a fix in the component for a future version of the Eclipse platform.


TL;DR

There is a new version of JMC out. Have fun! Smile

JDK Mission Control 8.1.0 Released!

Yay, the latest release of JDK Mission Control was just released! Since this is the source release, it may still take a bit of time until the downstream vendors release binary builds of JDK Mission Control 8.1.0. I will try to remember to tweet or say something on the JMC Facebook page once the binaries start showing up.

Mission Control 8.1 – New and Noteworthy


General


JMC 8.1 – New Release!
This is a new minor release of Java Mission Control. The JMC application will now require JDK 11+ to run, but can still be used with OpenJDK 8u272+ and Oracle JDK 7u40+. It can also still open and visualize flight recordings from JDK 7 and 8.

jmc


Eclipse 4.19 support
The Mission Control client is now built to run optimally on Eclipse 2021-03 and later. To install Java Mission Control into Eclipse, go to the update site (Help | Install New Software…). The URL to the update site will be vendor specific, and some vendors will instead provide an archive with the update site.

eclipse


Minor bugfixes and improvements
There are more than 80 fixes and improvements in this release. Check out the JMC 8.1 Result Dashboard (https://bugs.openjdk.java.net/secure/Dashboard.jspa?selectPageId=20404) for more information.

issues


Core


New Serializers Core Bundle
There is now a new core bundle making it easy to serialize flight recording data to DOT (Graphviz) and JSon. This bundle will be expanded upon in future versions.

serializers


Improved JFR parser performance
The performance of the JFR parser has been improved. More improvements are coming in 8.2.

parserperf


Java Flight Recorder (JFR)


Support for the new JDK 16 Allocation Events
A new form of light weight allocation profiling was introduced with JDK 16 (see https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8257602). This version of JMC supports this new type of allocation profiling.

allocationprof


New Page for Peeking into the Constant Pools
There is a new page available for taking a look at what constants are available in the recording. This can, for example, be useful when creating custom events to see where all that storage and memory is being used.

constantpool


Open Recordings with .lz4 extension
For convenience, files with the .lz4 extension will now be attempted to be opened as flight recordings. This is since lz4 is a common compression to use with flight recordings.

lz4


JMC Agent Plug-in


New JMC Agent Plug-in
There is now a new agent plug-in available for JMC, which allows configuring where to emit flight recording events in an already running process.

agent


Bug Fixes


Area: JFR
Issue: 6939
Synopsis: Time range indicator update problem fixed

Sometimes the time range indicator wasn’t updated when setting the time range. This is now fixed.

Area: JFR
Issue: 7007
Synopsis: Unable to edit run configurations for eclipse project after installing JMC plugin fixed

Previously it would not be possible to edit run configuration after installing the experimental JMC launcher plug-in. This has now been resolve.


Known Issues


Area: General
Issue: 4270
Synopsis: Hibernation and time

After the bugfix of https://bugs.openjdk.java.net/browse/JDK-6523160 in JDK 8, the RuntimeMXBean#getUptime() attribute was re-implemented to mean “Elapsed time of JVM process”, whilst it previously was implemented as time since start of the JVM process. The uptime attribute is used by JMC, together with RuntimeMXBean#getStartTime(), to estimate the actual server time. This means that time stamps, as well as remaining time for a flight recording, can be wrong for processes on machines that have been hibernated.

Area: JFR
Issue: 7071
Synopsis: JMC can’t attach to jlinked JVMs

This one is still under investigation, but it seems JMC can’t attach to certain jlinked images.

Area: JFR
Issue: 7068
Synopsis: JfrRecordingTest (uitest) hangs on the automated analysis page

Trying to run uitests on Fedora hangs on JfrRecordingTest.

Area: JFR
Issue: 7003
Synopsis: The graph and flame graph view does not work on Windows

This is due to a bug with the Edge based browser component in SWT. We’ll look into it for 8.2.0.

Area: JFR
Issue: 6265
Synopsis: JMC crashes with Webkit2+GTK 4

See the issue for more information.

Area: JFR
Issue: 5412
Synopsis: Dragging and dropping a JFR file into an open analysis page does not work

The expected behaviour would be to open the recording whenever a file is dropped in the editor area, but the behaviour will be defined by the embedded browser component, and not very useful.

JMC Core Now on Maven Central!

Good news! The JDK Mission Control core library bundles are now available on Maven Central, making it easier than ever to use things like the the JMC JDK Flight Recorder parser to transparently parse and extract information from flight recordings ranging from Oracle JDK 7 and up to the very latest versions of OpenJDK.

I have updated my JShell example so that you can see an example of them being used.

For just using the parser, you will typically need the common and flightrecorder bundles. If you also need the rules engine, you add flightrecorder.rules, and if you want the base set of heuristics for the jdk, also add flightrecorder.rules.jdk.

For example:

	<properties>
		<jmc.version>8.0.1</jmc.version>
	</properties>
	<dependencies>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>common</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder.rules</artifactid>
			<version>${jmc.version}</version>
		</dependency>
		<dependency>
			<groupid>org.openjdk.jmc</groupid>
			<artifactid>flightrecorder.rules.jdk</artifactid>
			<version>${jmc.version}</version>
		</dependency>
	</dependencies>

Finally! 🙂

Continuous Production Profiling and Diagnostics

I’ve gotten a lot of questions about continuous production profiling lately. Why would anyone want to profile in production, or, if production profiling seems reasonable, why the heck leave it on continuously? I thought I’d take a few moments and share my take on the problem and the success I’ve seen the past years applying continuous production profiling in systems in the real world.

Trigger warning: this blog will not contain code samples. 😉

Profiling?

So what is software profiling then? It’s the ancient black magic art of trying to figure out how something is performing, for some aspect of performing. In American TV-series, the profiler is usually some federal agent who is adept at understanding the psychology of the criminal mind. The profiler attempts to understand key aspects of the criminal to make it easier for the law enforcement agents to catch him. In software profiling we’re kind of doing the same thing, but for software – your code as well as all the third party code you might be depending on.

We’re trying to build an accurate profile of what is going on in the software when it is being run, but in this case to find ways to improve a program. And to understand what is going on in your program, the profiler has to collect call traces and usually some additional context to make sense of it all.

In comparison to other observability tools, like metrics and logs, profilers will provide you with a holistic view of a running program, no matter the origin of the code and requiring no application specific instrumentation. Profilers will provide you with detailed information about where in the actual code, down to the line and byte code index, things are going down. A concrete example would be learning which line in a function/method is using most of the CPU, and how it was being called.

It used to take painting a red pentagram on the floor, and a healthy stock of black wax candles, to do profiling right. Especially in production. Overhead of early profilers weren’t really a design criteria; it was assumed you’d run the process locally, and in development. And, since it was assumed you’d be running the profiling frontend on the same machine, profiling remote processes were somewhat tricky and not necessarily secure. Production profilers, like JFR/JMC came along, but they usually focus on a single process, and since security is a bit tricky to set up properly, most people sidestep the problem altogether and run (yep, in production) with authentication and encryption off.

Different Kinds of Profiling

Profiling means different things to different people. There are various types of resources that you may be interested in knowing more about, such as CPU or locks, and there are different ways of profiling them.

Most people will implicitly assume that when talking about profiling, one means CPU-profiling – the ancient art of collecting data about where in the code the most CPU-time is spent. It’s a great place to start when you’re trying to figure out how to make your application consume less CPU. If you can optimize your application to do the same work with less resources, this of course directly translates into lowering the bill to your cloud provider, or being able to put off buying those extra servers for a while.

Any self-respecting modern profiling tool will be able to show more than just the CPU aspect of your application, for example allocation profiling or profiling thread halts. Profiling no longer implies just grabbing stack-traces, and assigning meaning to the stack trace depending on how it was sampled; some profilers collaborate closely with the runtime to provide more information than that. Some profilers even provide execution tracing capabilities.

Execution tracing is the capability to produce very specific events when something interesting happens. Execution tracing is available on different levels. Operating systems usually provide frameworks allowing you to listen on various operating system events, some even allowing you to write probe definitions to decide what data to get. Examples include ETW, DTrace and eBPF. Some runtimes, like the OpenJDK Java VM, provide support for integrating with these event systems, and/or have their own event system altogether. Java, being portable across operating systems, and wanting to provide context from the runtime itself, has a high performance event recorder built in, called the JDK Flight Recorder. Benefits include cheap access to information and emission of data and state already tracked by the runtime, not to mention an extensible and coherent data model.

Here are a few of my favourite kinds of profiling information:

  • CPU profiling
  • Wall-clock profiling
  • Allocation profiling
  • Lock / Thread halt / Stop-the-World profiling
  • Heap profiling

Let’s go through a few of them…

CPU Profiling

CPU profiling attempts to answer the question about which methods/functions are eating up all that CPU. If you can properly answer that question, and if you can do something about it (like optimizing the function or calling it less often) you will use less resources. If you want to reduce your cloud provider bill, this is a great place to start. Also, if you can scope the analysis down to a context that you care about, let’s say part of a distributed trace, you can target improving the performance of an individual API endpoint.

Wall-Clock Profiling

Wall-clock profiling attempts to answer the question about which method/function is taking all that time, no matter if on CPU or not. For runtimes supporting massively multithreaded applications, this information is much less useful without some context.

For example, let’s say you have a Java application with various thread pools running various kinds of operations. You may have hundreds of threads, all of them mostly parked, awaiting some work to do. Unless you have some context, all the wall-clock profiling will tell you is that most threads were parked. But if you do have some context, let’s say context around which span in a distributed trace is running when samples are taken, your wall-clock profiling data can tell you in which methods most of the time was spent during a particularly long lasting span. [1]

As a general rule of thumb, wall-clock profiling is useful for finding and optimizing away latencies, whereas CPU profiling is more suited for optimizing throughput. Also, execution tracing is a great complement to wall-clock profiling.

If you can tell where the wall-clock time is spent, you can help remove performance obstacles by seeing which method calls take time and optimize them, or reduce the number of calls to them.

Allocation Profiling

Allocation profiling is trying to answer where all that allocation pressure is coming from, and from allocating what. This is important, since all that allocated memory will usually have to be reclaimed at some point in time, and that uses both CPU and possibly causes stop-the-world pauses from GC (though modern GC technologies, for example ZGC for the Java platform, is making this less of an issue for some types of services).

If you can properly answer where the allocation pressure comes from, you can bring down GC activity by optimizing the offending methods, or have your application call them less.

Lock / Thread Halt / Stop-the-World (STW) Profiling

This kind of profiling tries to answer the question about why my thread didn’t get to run right there and right then. This is typically what you would use the wall-clock-profiler for, but the wall-clock-profiler usually has some serious limitations, making it necessary to collaborate with the runtime to get some additional context. The wall-clock profiler typically only gets sampled stack traces showing you which method you spent time in, but without context it may be hard to know why.

Here are some examples:

  • Your thread is waiting on a monitor
    Context should probably include which thread is currently holding the monitor, which address the monitor has, the time you had to wait etc.
  • Your runtime is doing something runtimey requiring stopping the world, showing your method taking its own sweet time, but not offering any clues as to why
    • STW phase due to GC happening in the middle of running your method.
    • STW phase due to a heap dump
    • STW phase due to full thread stack dump
    • STW phase due to bad behaving framework, or your well meaning colleague(s), forcing full GCs all the time, since they “know that a GC really improves performance if done right there”, not quite realizing that it’s just a small part of a much bigger system.
  • Your thread is waiting for an I/O operation to complete
    Context should probably include the IP address (socket I/O) or file (file I/O), the bytes read/written etc.

There are plenty of more examples, wait, sleep, park etc. To learn more, open JDK Mission Control and take a look at individual event types in the event browser.

Heap Profiling

This kind of profiling attempts to answer questions about what’s on your heap and, sometimes, why. This information can be used to reduce the amount of heap required to run your application, or help you solve memory leaks. Information may range from heap histograms showing you the number of instances of each type on the heap, to leak candidates, their allocation times and allocation stack traces, together with the reference chains still holding on to them.

Continuous Production Profiling

Assuming that your application always has the same performance profile, which implies always having exactly the same load and never being updated, with no edge cases or failure modes, and assuming perfectly random sampling, your profiler could simply take a few samples (let’s say 100 to get a nice distribution) over whichever time period you are interested in (let’s say 24 hours), and call it a day. You would have a very cheap breakdown over whatever profiling information you’re tracking.

These days, however, new versions of an application are deployed several times a day, evolving to meet new requirements at a break-neck speed. They are also subjected to rapidly changing load profiles. Sometimes there may be an edge case we didn’t foresee when writing the program. Being able to use profiling data to not only do high level performance profiling, but detailed problem resolution, is becoming more and more common, not to mention useful.

At Datadog, we’ve used continuous production profiling for our own services for many months now. The net result is that we’ve managed to lower the cost of running our services all over the company by quite large amounts of money. We’ve even used the profiler to improve our other components, like the tracer. I had the same experience at Oracle, where dedicated continuous profiling analysis was used to a great extent for problem resolution in production systems.

Aside from being incredibly convenient, there are many different reasons why you might want to have the profiler running continuously.

Change Analysis

These days new versions are deployed several times a day. This is certainly true for my team at Datadog. There is great value in being able to compare the performance profile, down to the line of code. This is true across new releases, specific time intervals, over other attributes like high vs low CPU load, and countless other facets.

Fine Grained Profiling

Some production profiling environments allow you to add context, for example custom events, providing the means to look at the profiling data in the light of something else happening in a thread at a certain time. This can be used for doing breakdowns of the profiling data for any context you put there, any time, anywhere.

Adding some contextual information can be quite powerful. For example, if we were able to extend the profiling data with information about what was actually going on in that thread, at that time, any other profiling data captured could be seen in the light of that context. For example, WebLogic Server produced Flight Recorder Events for things like SQL calls, servlet invocations etc, making it much easier to attribute the low-level information provided by the profiler to higher level constructs. These events were also associated with an Execution Context ID which spanned processes, making it possible to follow along in distributed transactions.

With the advent of distributed tracing, this can be done in a fairly general way, so that profiling data can be associated with thread local activations of spans in a distributed trace (so called scopes). [1]

That said, with a general recording framework, there is no limit to the kinds of contexts you can invent and associate your profiling data with.

Diagnostics

It’s 2:03 a.m., all of a sudden some spans in your distributed trace end up taking a really long time. Looking at the spans, there is nothing indicating something is actually going wrong, or that the data is bad. From what is present in the tag data, nothing seems to be related between the spans. You decide to open up the profile.
The automated analysis informs you that a third-party library has initiated safe pointing VM operations from a certain thread, in this case for doing full heap dumps. The analysis text points you to more documentation about what a safe point is. You read up on safe pointing VM operations, and the library, and find out that under certain conditions, the library can initiate an emergency heap dump, but that the feature can be turned off. You turn it off, redeploy and go back to sleep.

Or, perhaps the automated analysis informs you that there is heavy lock contention on the apache logger, and links you to the lock profiling information. Looking at the lock profiling information, it seems most of the contention is being caused by the logging done on one particular line. You decide that the logging there is not essential, remove it, commit, redeploy and go back to sleep.
When something happens in production, you will always have data at hand with a continuous profiler. There is no need to try to reproduce the exact environment and conditions under which the problem occurs. You will always have actionable data readily available.

Of course, the cure must not be worse than the ailment. If the performance overhead you pay for the information costs you too much, it will not be worth it. Therefore this rather detailed information must be collected quite inexpensively for a continuous production profiler.

Low-overhead Production Profiling

So, how can one go about producing this information at a reasonable cost? Also, we can’t introduce too much observer effect, as this will skew the data, and not truly represent the application behaviour without the instrumentation.

There are plenty of different methods and techniques we can use. Let’s dig into a few.

Using Already Available Information

If the runtime is already collecting the data, exporting it can usually be done quite cheaply. For example, if the runtime is already collecting information about the various garbage collection phases, perhaps to drive decisions like when to start initiating the next concurrent GC-cycle, that information is already readily available. There is usually quite a bit of information that an adaptively optimizing runtime keeps track of, and some of that information can be quite useful for application developers.

Sampling

One technique we can use is to not take every single possible value, but do statistical sampling instead. In many cases this is the only way which makes sense. Let’s take CPU profiling for example. In most cases, we will be able to select an upper boundary for how much data we produce by either selecting the CPU quanta between samples, or by selecting a fixed number of threads to look at any given time and the sampling period. There are also more advanced techniques for getting a fixed data rate.

An interesting example from Java is the new upcoming allocation profiling event. Allocation in Java is most of the time approximately the cost of bumping a pointer. The allocation takes place in thread local area buffers (TLABs). There is no way to do anything in that code path without introducing unacceptable overhead. There are however two “slow” paths in the allocator. One for when the TLAB is full. The other one for when the object is too large to fit in a TLAB (usually by allocating an enormous array) leading to the object being allocated directly on heap. By sampling our allocations at these points, we get relatively cheap allocation events that are proportional to the allocation pressure. If we were able to configure how often to subsample over the average amount of memory allocated between samples, we would be able to regulate the acceptable overhead. That said, what we’re really looking for is a constant data production rate, so regulating that is better left to a PID-style controller, giving us a relatively constant data production.

Of course, the less sample points we have, the less we can say about the behaviour over very short periods of time.

Thresholding

One sort of sampling is to simply only collect outliers. For some situations, we really would like to get more information. One example might be thread halts that take longer than, say, 10ms. Setting a threshold allows us to do a little bit of more work, when it’s very much warranted. For example, I might only be interested in tracking blocking I/O reads/writes lasting longer than a certain threshold, but for them I’d like to know the amount of bytes read/written, the IP address read from/written to etc.

Of course, the higher the threshold, the more data we will miss (unless we have other means to account for that time). Also, thresholds make it harder to reason about the actual data production rate.

Protect Against Edge Cases

Edge cases which make it hard to reason about their potential overhead should be avoided, or at least handled. For example, when calculating reference chains, you may provide a time budget for which you can scan, and then only do it when absolutely needed. Or, since the cost of walking a stack trace can be proportional to the number of frames on the stack, you can set an upper limit to how many frames to walk, so that recursion gone wild won’t kill your performance. Be careful to identify these edge cases, and protect against them.

One recent example is the Exception event available in the Flight Recorder (Java), which can be configured to only capture Errors. The Java Language Specification defines an Error like this:

“Error is the superclass of all the exceptions from which ordinary programs are not ordinarily expected to recover.”

You would be excused for believing that Errors would happen very rarely, and that recording all of them would not be a problem. Well, a very popular Java framework, which will remain unnamed, subclassed Error in an exception class named LookAheadSuccess. That error was used in a parser and used for control flow, resulting in the error being thrown about a gazillion times per minute. We ended up developing our own solution for exception profiling at DD, which records Datadog specific events into the JDK Flight Recorder.

Some Assembly Required

These techniques, and more, can be used together to provide a best-of-all-worlds profiling environment. Just be careful, as with most things in life a balance must be found. Just like there is (trigger warning) no single energy source that will solve our energy problems in a carbon neutral way (we should use all at our disposal – including nuclear power – to have a chance to go carbon neutral in a reasonable time [2][3]), a balance must be struck between sampling and execution tracing, and a balance for how much data to capture for the various types of profiling you’re doing.

Continuous Profiling in Large Deployments
Or, Finding What You’re Looking For

In a way this part of the blog will be a shameless plug for the work I’ve been involved with at Datadog, but it may offer insights into what matters for a continuous profiler to be successful. Feel free to skip if you dislike me talking about a specific commercial solution.

So, you’ve managed to get all that juicy profiling down to a reasonable amount of data (for Datadog / Java, on average about 100k events per minute, with context and stacktraces, or 2MB per minute, at less than 2% CPU overhead), that you can process and store without going broke. What do you do next?

That amount of data will be overwhelming to most people, so you’ll need to offer a few different ways into the data. Here are a few that we’ve found useful at Datadog:

  • Monitoring
  • Aggregation
  • Searching
  • Association by Context
  • Analysis

Monitoring

All that detailed data that has been collected can, of course, be used to derive metrics. We differentiate between two kinds in the profiling team at Datadog:

  • Key Performance Metrics
  • High Cardinality Metrics

Key performance metrics are simple scalar metrics, you typically derive a value, periodically, per runtime. For example CPU utilization or allocation rate.

Here’s an example showing a typical key performance metric (note that all pictures are clickable for a better look):

kpm

The graph above shows the allocation rate. It’s a simple number per runtime that can change over time. In this case the chart is an aggregate over the service, but it could just as well be a simple metric plotted for an individual runtime.

High Cardinality Metrics are metrics that can have an enormous amount of different buckets with which the values are associated with. An example would be the cpu time per method.

We use these kinds of metrics to support many different use cases, such as allowing you to see the hottest methods in your entire datacenter. The picture below shows the hottest allocation sites across a bunch of processes.

hcm2

Here are some contended methods. Yep, one is a demo…

locks

Metrics also allow you to monitor for certain conditions, like having alerts / watchdogs when certain conditions or changes in conditions occur. That said, they aren’t worth that much unless you can, if you find something funny, go see what was going on – for example see how that contended method was reached when under contention.

Aggregation

Another use case is when you simply don’t care about a specific use case. You just want to look at the big picture in your datacenter. You may perhaps want to see, on average, across all your hosts and for a certain time range, what the CPU profiling information looks like? This would be a great place to start if, for example, looking for ways to lower the CPU usage for Friday nights, 7 to 10 p.m.

Here, for example, is an aggregation flame graph for the profiling data collected for a certain service (prof-analyzer), where there is some load (I set it to a range to filter out the profiles with very little load).

aggregation

A specific method can be selected to show how that specific method ended up being called:
methodselect

Searching

What if you just want to get to an example of the worst possible examples of using a butt-load of CPU? Or if you want to find the worst example of a spike in allocation rate? Having indexed key performance metrics for the profiling data makes it possible to quickly search for profiling information matching certain criteria.

Here is an example of using the monitor enter wait time to filter out an atypically high lock contention:

atypicallock

Association by Context

Of course, if we can associate the profiling data with individual traces, it would be possible to see what went on for an individual long lasting span. If using information from the runtime, even things that are normally hidden from user applications (including profilers purely written in Java), like stop-the-world pauses, would be visible.

breakdown

Analysis

When having access to all that yummy, per thread and time, detailed, profiling data, it would be a shame to not go looking for some interesting patterns to highlight. The result of that analysis can provide a means to focus on the most important parts of the profiling data.

analysis1

So, nothing terribly interesting going on in our services right now. The one below is from a silly demo app.
analysis2

That said, if you’re interested in the kind of patterns we can detect, check out the JDK Mission Control rules. The ones at Datadog are a superset, and work similarly.

Summary

Profiling these days is no longer limited to high overhead development profilers. The capabilities of the production time profilers are steadily increasing and their value is becoming less controversial, some preferring them for complex applications even during development. Today, having a continuous production profiler enabled in production will offer unparalleled performance insights into your production environment, at an impressively low performance overhead. Data will always be at your fingertips when you need it.

Additional Reading

https://www.datadoghq.com/blog/datadog-continuous-profiler/
https://www.datadoghq.com/blog/engineering/how-we-wrote-a-python-profiler/

Many thanks to Alex Ciminian, Matt Perpick and Dan Benamy for feedback on this blog.


[1]: Deep Distributed Tracing blog: https://hirt.se/blog/?p=1081

Unrelated links regarding the very interesting and important de-carbonization debate:

[2]: https://theness.com/neurologicablog/index.php/there-is-no-one-energy-solution/

[3]: https://mediasite.engr.wisc.edu/Mediasite/Play/f77cfe80cdea45079cee72ac7e04469f1d
(No longer available, but this youtube clip is related, and also presented by Dr. Jesse Jenkins):
https://youtu.be/ZYfD1Z_zkfc

The “Best of the JDK” Tournament

Over the last few weeks, there has been a knock-out tournament raging on Twitter, where various Java technologies have battled out which JDK technology is the best. It’s all part of the activities taking place around the celebration of Java turning 25 years. And boy, have those years been interesting.

Like many languages in use today, Java started out with a simple interpreter. That is, by the way, how Java got a reputation for being slow. Today, Java peak performance can surpass that of statically compiled languages, owing to optimizations only possible when runtime information is available. But I digress

As many of you know, I started out co-founding a company named Appeal – the company that created the JRockit JVM. We did quite a few cool things during that time; some of them relevant to the knock-out competition. We built the world’s first JVM management console, mostly since the application to become a Java licensee (so that JRockit could become a Sun certified JVM) required us to state a value-add. Our original application stated “better performance”, and was summarily turned down. 😉 With the work on the management console we eventually consolidated an API to monitor and manage the JVM – JMAPI (the JRockit Management API), which later inspired – and was superseded by – JSR-174 (java.lang.management)[1].

We also built a tool we called JRA (JRockit Runtime Analyzer). It really started out as a tool for finding out how the JVM was performing at customer installations – we needed information to better understand how to improve the JVM for real world usage. Customers, quite understandably, refused to let us borrow their applications to run them in our labs. To make it easy for them to understand and verify the data they were sharing, it was all emitted as text (XML). It didn’t take long for customers to see us use the tool and the (accidental) value it brought for optimizing their applications – was the tool perhaps for sale? As a startup, we of course said yes, and made it into a product. When we later introduced the JRockit DetGC (deterministic GC), there was a need to be able to prove that the GC was keeping the latency contract, and show where in the customer code any thread halts were caused (e.g. due to bad synchronization). So the JRockit Runtime Analyzer was extended with LAT (the Latency Analysis Tool), which now introduced a binary artifact for the latency data for better data density and less serialization cost. In the end the JRA and LAT was unified into a single model – JFR (JRockit Flight Recorder, later Java Flight Recorder, and finally re-dubbed into JDK Flight Recorder when it was open sourced). We also created an impossibly cool on-line memory analysis tool (which was sadly never ported to hotspot), together with a slew of other little tools and utilities.

The good old JMC memleak tool

Some of these tools converged into Java Mission Control, which became the hub for the cool tools we were developing.

JMC Logo

I was happily surprised to see JDK Mission Control included in the “Best of the JDK” feature face off. I was doing little dad-dances (to the embarrassment of my kids) in total astonishment when JDK Mission Control got up against the runtime and language features and ultimately won the whole thing.

Competition Results

Tech Poetry Throw-Down

One of the best parts of this whole competition was when Erik Costlow wrote some poetry in support of JDK Mission Control. This sparked an epic tech-poetry throw-down with little poems in favour of various Java technologies.

Here are a few of my favourites entries for JMC & JFR (in no particular order):

Of JDK Mission Control

whose benefits I will extol:

It watches performance

while still in conformance

So therefore it should win this poll.

  – @costlow

(The one which started the it all)

2 am in the morning, my mobile chimed,

The war room conf call had to be primed.

JVM’s are down, the helpdesk said,

Touch troubleshooting road ahead.

CPU? GC? Bad Code?, the questions abound,

The root cause was far from being found.

Tumultuous voiced from Dev to Ops, each one declaring the were clean

No path to the solution was to be seen.

With a prayer, I fired up the Java Flight Recorder,

Hoping this would restore some war room order.

Lo! And behold, the histogram revealed

‘Twas a code deadlock, the system could yet be healed!

Helpful NullPointer messages, I hear you say,

Who will alert you whilst you are away?

  – @perfclarity

To see or not to see (perf data)

That is the question (mission control answers).

Whether ‘tis nobler in the code

To suffer the zings and harrows of outrageous finger pointing

Or to stream events and by analyzing, end it

  – @costlow

I have never

had to deal

with NullPointer

Exceptions

and which

many people want

to have

better messages

Forgive me

but my vote goes to JMC

it is so sweet

and so cold

  – @stuartmarks

To think that I could ever see

A tool so lovely: JMC

A tool that streams events all day

Yet still performs without delay.

  – @costlow

If you need to control a mission

OpenJDK had an omission

And then JMC

Was suddenly free

Without even rights of rescission

  – @stuartmarks

So much value inside JMC

Yet usage was low, tis it wasn’t free

But low and behold

Oracle open sourced it in whole

And now productivity is as easy can be

  – @Sharat_Chander

As I stream through the events of my workload perf pain

I take a look post 8 life and realize this tool should reign

‘cause that’s just perfect for a coder like me

You know we love fancy things like JDK MC

Been spendin’ most our lives livin’ in a coder’s paradise

@costlow

Here are a few of my favourites for the other technologies:

Null pointer exception

Is a old familiar friend

And she wants to be

more helpful again

With deep information

I can only begin to extol

Love for NPE

For she should win

this Java poll

  – @manicode

There was a NullPointerException

Whose message needs amplification

To the VM some hacks

Add the relevant facts

And no longer is it an obsession

  – @stuartmarks

As I try to decipher my NPE in grails

The Greater Sage-Grouse wanders the sage brush

The grouse and I are one

For I can’t decipher less helpful NPE’s in grails

Any more than the sage-grouse knows why it wanders the sagebrush

  – @manicode

I’m on a boat motherf$%^r take a look at me

Straight floatin’ on a boat debugging NPE

Busting five knots, wind whipping out my coat

You can’t stop me motherf$%^r cause I’m debugging on a boat

  – @manicode

The usability of NullPointerExceptions

have historically been an issue

by adding static code to dynamic exceptions

our problems we can diffuse

Let go of your stack trace debugging hate

And vote for JEP Three Fifty Eight!

  – @manicode

Many thanks to @costlow, @manicode, @stuartmarks, @perfclarity and @Sharat_Chander for all the laughs! 🙂

Thanks!

Yes, I know this is a silly little Twitter competition. But, if nothing else, this silly little competition provides an excellent opportunity for me to give some overdue thanks:

  • Plenty of thanks and love to all of the users of JMC out there, using JMC to solve tricky problems in production systems on a daily basis.
  • Many thanks to everyone who voted for JMC. I didn’t think a tool would stand a chance against language and runtime features.
  • Huge thanks to all the developers on the JDK Mission Control team, and to all the developers on the JDK serviceability team. You’re a really awesome bunch, and it’s a privilege for me to be working with you.
  • Major kudos to Oracle for open sourcing JDK Mission Control and JDK Flight Recorder.
  • Many thanks to the main sponsors of the development of JDK Mission Control:

JRockit and Duke hanging

[1]: Sadly, not all of the features in JMAPI got rolled into the standardized API. JMAPI could, for example, change the CPU affinity of the JVM process on the fly, dynamically change the heap size target, and independently (and dynamically) switch the GC to use a nursery or not as well as switch between concurrent and parallel mark and/or sweep phases. Of course differences in GC capabilities etc required the standardized API to be limited to what made sense to most runtimes. That said, I’m still kinda bummed that it became a JMX API (java.lang.management depending on the javax.management specification), instead of a pure local Java API, which could also have been exposed through JMX. See, for example, the JFR APIs, where there is a local API and also a JMX API.

Fetching and Building Mission Control 8+

As described in a previous post, Mission Control is now on GitHub. Since this alters how to fetch and build OpenJDK Mission Control, this is an updated version of my old post on how to fetch and build JMC from version 8 and up.

Getting Git

First step is to get Git, the SCM used for OpenJDK Mission Control. Installing Git is different for different platforms, but here is a link to how to get started:

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

Installing the Skara Tooling (Optional)

This is an optional step, making it easier if you want to contribute to Mission Control:

https://hirt.se/blog/?p=1186

Cloning the Source

Once Git is installed properly, getting the source is as easy as cloning the jmc repo. First change into the directory where you want to check out jmc. Then run:

git clone https://github.com/openjdk/jmc.git

Getting Maven

Since you probably have some Java experience, you probably already have Maven installed on your system. If you do not, you now need to install it. Simply follow the instructions here:

https://maven.apache.org/install.html

Building Mission Control

First we need to ensure that Java 8 is on our path. Some of the build components still use JDK 8, so this is important.

java –version

This will show the Java version in use. If this is not a Java 8 JDK, change your path. Once done, we are now ready to build Mission Control. Open up two terminals. Yep, two!

In the first one, go to where your cloned JMC resides and type in and execute the following commands (for Windows, replace the dash (/) with a backslash (\)):

cd releng/third-party
mvn p2:site
mvn jetty:run

Now, leave that terminal open with the jetty running. Do not touch.

In the second terminal, go to your cloned jmc directory. First we will need to build and install the core libraries:

cd core
mvn install

Next run maven in the jmc root:

mvn clean package

JMC should now be building. The first time you build Maven will download all of the third party dependencies. This will take some time. Subsequent builds will be less painful. On my system, the first build took 6:01 min. The subsequent clean package build took 2:38.

Running Mission Control

To start your recently built Mission Control, run:

Windows

target\products\org.openjdk.jmc\win32\win32\x86_64\jmc.exe -vm %JAVA_HOME%\bin

Mac OS X

target/products/org.openjdk.jmc/macosx/cocoa/x86_64/JDK\ Mission\ Control.app/Contents/MacOS/jmc -vm $JAVA_HOME/bin

Contributing to JDK Mission Control

To contribute to JDK Mission Control, you need to have signed an Oracle Contributor Agreement. More information can be found here:

http://openjdk.java.net/contribute/

Don’t forget to join the dev list:

http://mail.openjdk.java.net/mailman/listinfo/jmc-dev

We also have a Slack (for contributors), which you can join here:

https://join.slack.com/t/jdkmissioncontrol/signup

More Info

For more information on how to run tests, use APIs etc, there is a README.md file in the root of the repo. Let me know in the comments section if there is something you think I should add to this blog post and/or the README!

Mission Control is Now Officially on GitHub!

Since this morning, the JDK Mission Control (JMC) project has gone full Skara! mc_512x512This means that the next version (JMC 8.0) will be developed over at GitHub.

To contribute to JDK Mission Control, you (or the company you work for) need to have signed an OCA, like for any other OpenJDK-project. If you already have an OpenJDK username, you can associate your GitHub account with it.

Just after we open sourced JMC, I created a temporary mirror on GitHub to experiment with working with JMC at GitHub. That mirror is now closed for business. Please use the official OpenJDK one from now on:

https://github.com/openjdk/jmc

If you forked or stared the old repo, please feel free to fork and/or star the new one!

Compressing Flight Recordings

Flight recordings are nifty binary recordings of what is going on in the runtime and the application running on it. A flight recording contains a wide variety of information, such as various kinds of profiling information, threat stall information and a whole host of other information. All adhering to a common event model and with the ability to dynamically add new event types.

In the versions of JFR since JDK 9, some care was taken to reduce the memory footprint by LEB 128 encoding integers, noting that many things, like constant pool indices, usually occupy relatively low numbers. The memory footprint was cut in about half, compared to previous versions of JFR.

Now, sometimes you may want to compress the JFR data even further. The question then is – how much can you save if you compress the recordings further, and what algorithms would be best suited for doing the compression? What if you want the compression activity to use as little CPU as possible?

My friend and colleague at Datadog, Jaroslav Bachorik, set out to answer that question for some typical recording shapes that we see at Datadog, using a set of compression algorithms from Apache Commons Compress (bzip2, LZMA, LZ4), the built in GZip, a dedicated LZ4 library, XZ, and Snappy.

Below is a table of his findings for “small” (~1.5 MiB) and “large” (~5 MiB) recordings from one of our services. The benchmark was run on a MacBook Pro 2019. Now, you’d have to test on your own recordings to truly know, but I suspect that these results will hold up pretty well with other kinds of loads as well.

Algorithm Recording Size Throughput Compression Ratio Utility
Gzip small 24.299 3.28 79.647
Gzip large 5.762 3.54 20.436
BZip2 small 6.616 3.51 23.198
BZip2 large 1.518 3.84 5.826
LZ4 small 133.115 2.40 319.394
LZ4 large 38.901 2.57 100.009
LZ4 (Apache) small 0.055 2.74 0.152
LZ4 (Apache) large 0.013 3.00 0.039
LZMA small 1.828 4.31 7.882
LZMA large 0.351 4.37 1.533
Snappy small 134.598 2.27 305.494
Snappy large 35.986 2.49 89.692
XZ small 1.847 4.31 7.964
XZ large 0.349 4.37 1.523

Throughput is recordings/s. Utility is throughput * compression ratio, and meant to capture the combination of compression strength and performance. Note that the numbers are not normalized – only compare numbers in the same size category.

Summary / TL;DR

  • The built-in GZip is doing a fairly good/balanced job of compressing flight recordings
  • You can get the best utility out of LZ4, closely followed by Snappy, but you sacrifice some compression
  • If you’re prepared to pay for it, LZMA and XZ give a good compression ratio
  • All credz to Jaroslav for his JMH-benchmark and all the data

JFR is Coming to OpenJDK 8!

I recently realized that this isn’t common knowledge, so I thought I’d take the opportunity to talk about the JDK Flight Recorder coming to OpenJDK 8! The backport is a collaboration between Red Hat, Alibaba, Azul and Datadog. These are exciting times for production time profiling nerds like me. Smile

The repository for the backport is available here:

http://hg.openjdk.java.net/jdk8u/jdk8u-jfr-incubator/

The proposed CSR is available here:

https://bugs.openjdk.java.net/browse/JDK-8230764

The backport is keeping the same interfaces and pretty much the same implementation as is available in OpenJDK 11, and is fully compatible. There were a few security fixes, due to there not being any module system to rely upon for isolation of the internals, also, some events will not be available (e.g. the Module related events) but other than that the API and tools work exactly the same.

JDK Mission Control will, of course, be updated to work flawlessly with the OpenJDK 8 version of JFR as well. The changes will be minute and are only necessary since Mission Control has some built-in assumptions that no longer hold true.

You can already build and try out OpenJDK 8 with JFR simply by building the JDK available in the repository mentioned above. Also, Aleksey Shipilev provide binaries – see here for details.

Have fun! Smile