Flight recordings are nifty binary recordings of what is going on in the runtime and the application running on it. A flight recording contains a wide variety of information, such as various kinds of profiling information, threat stall information and a whole host of other information. All adhering to a common event model and with the ability to dynamically add new event types.
In the versions of JFR since JDK 9, some care was taken to reduce the memory footprint by LEB 128 encoding integers, noting that many things, like constant pool indices, usually occupy relatively low numbers. The memory footprint was cut in about half, compared to previous versions of JFR.
Now, sometimes you may want to compress the JFR data even further. The question then is – how much can you save if you compress the recordings further, and what algorithms would be best suited for doing the compression? What if you want the compression activity to use as little CPU as possible?
My friend and colleague at Datadog, Jaroslav Bachorik, set out to answer that question for some typical recording shapes that we see at Datadog, using a set of compression algorithms from Apache Commons Compress (bzip2, LZMA, LZ4), the built in GZip, a dedicated LZ4 library, XZ, and Snappy.
Below is a table of his findings for “small” (~1.5 MiB) and “large” (~5 MiB) recordings from one of our services. The benchmark was run on a MacBook Pro 2019. Now, you’d have to test on your own recordings to truly know, but I suspect that these results will hold up pretty well with other kinds of loads as well.
|Algorithm||Recording Size||Throughput||Compression Ratio||Utility|
Throughput is recordings/s. Utility is throughput * compression ratio, and meant to capture the combination of compression strength and performance. Note that the numbers are not normalized – only compare numbers in the same size category.
Summary / TL;DR
- The built-in GZip is doing a fairly good/balanced job of compressing flight recordings
- You can get the best utility out of LZ4, closely followed by Snappy, but you sacrifice some compression
- If you’re prepared to pay for it, LZMA and XZ give a good compression ratio
- All credz to Jaroslav for his JMH-benchmark and all the data
Could you add 7zip, pigz and pbzip2, too?
Hi Götz – for what we’re doing, we’re actually pretty happy with LZ4 at this point. For our particular scenarios we don’t want to run the compression/decompression on multiple threads. 7zip could potentially be interesting. We’ll see.