Cross-language benchmarking made easy?

First of all, in most cases it is not the source code you wrote, that gets actually executed, second thing, please define “faster” upfront, third thing, and general rule of experiments: Do not draw conclusions based on a benchmark, see them more of a hint that some things seem to make a difference.

In this article series, I will not discuss about numbers and reasons, why code X or language Y supersedes the other. There are many people out there who understand the backgrounds much better than me – here, short hint to a very good article about Java vs. Scala benchmarking by Aleksey Shipilëv. There will be no machine code, no performance tweaks which make your code perform 100 times better. I want to present you ideas, how you can set such micro benchmarks up in a simple, automated and user friendly way. In detail we will come across these topics:

How to setup one build that fits all requirements?
Gradle in action building across different languages
Benchmarking with JMH (java) and Hayai (C/C++) to proof the concept
How to store the results?

1.) Background

Some introductory words about the article background. Years back at university, I was member of the computer graphics group there. I spent much time with C/C++ programming because there was one common opinion: “If you want it really fast, don’t waste time with VM languages and other ‘esoteric’ stuff, write it in a language which compiles to native binary code. Well that doctrine has much truth in it, and that must be the reason why many performance-critical, number-crunching applications are implemented in languages like C and C++. For graphics related applications there is one other obvious reason, the access to the graphics card if you want to use it in a more advanced way. Finally cache-aligned programming, direct usage of vectorization intrinsics and access to CPU specific features are the real ‘cherry on the cake’ of native code applications. And that is what I learned in those days: Native code always outperforms VM languages and this is the opinion of the majority of software developers out there. Now, long story short, especially for those who already loose their poise due to that statement: It depends of course! And I don’t want to open yet another discussion about the endless list of benefits and pitfalls for the different languages. But the fact is, that measuring code performance and comparing them is somehow hard because there is no common approach or best practice. That will be hopefully the essence of this blog post series. Interpretations and conclusions based on the numbers can be drawn by others ;).

2.) Java vs. C++ – So no war today

I thought it would be a good starting point, to take up the challenge with two of the most wide-spread object-oriented programming languages. I don’t want to get into detail about the differences now. I think all of you know about the fact of JIT compilation taking place in Hotspot while C++ code gets compiled directly to machine code during the build step. That means, the machine code executed by the JVM will probably change over the execution, the C++ compilation result won’t. I will also neglect the fact that there are different compilers and run times out there which will influence the resulting numbers seriously.

3.) Where this will lead to – Historized Automated Cross Language Benchmarking

Cross language benchmarking in general cannot have the purpose to proof that one compiler / run time environment outplays the other. See it as a challenge to identify differences that are not obvious in the first place, a starting point to dig deeper into the details. Or just to play around and see what happens if you change this an that on one of the implementations.

I’ve spent some time thinking about what would be useful, if you have one algorithm implemented in two different languages and now you want to know which result seems to perform better. This means, the tool set needs to be Cross Language capable. If I change the implementation in one language flavor, I want to use one common tool chain that creates a report how the current implementations perform compared to each other. In other words the creation of all benchmarking reports and eventual aggregation should be accomplished in an Automated way. Measuring performance optimization progress for example is an evil thing when you have no real history of reports attached to the according version in your version control system, which would also include details about the environment and such things. The final idea is to get a diagram which shows the change of benchmark results over time which will be possible due to the fact that the approach will be Historized . Given that, I see two main purposes for my ideas:

Fun: Gamified language challenges
Boring but useful: Builds where you have a reference implementation in another language which is defined as a quality gate in certain bounds.

I’ve put much effort in evaluating different possibilities. The following setup tries to cover all these topics for the following POC scenario: Let’s assume we have a Java and C++ project which implement the same algorithms. We want to benchmark and compare the performance of hot code parts. We also want to track the changes over time when the project source code grows and changes. We will:

Benchmark Java Code with JMH as part of a Gradle build
Benchmark C++ code with Hayai
Integrate c++ binary compilation with Gradle
Integrate Hayai benchmark execution with Gradle
Bring Java and C++ projects together in one cross-language Gradle build chain
Aggregate JMH and Hayai results in a third composite result
Split benchmarking code out of the projects into a dedicated project and dedicated SCM.
Automatically push aggregated benchmarking results to a cleverly structured git repository to keep track between current source code versions and benchmarking results.

4.) First Step: Benchmarking Java code with Gradle and JMH

Some months ago, Daniel Mitterdorfer released several articles about doing Microbenchmarks in Java the right way. He outlines the different pitfalls and specialties which need significant attention in order to get meaningful results. The presented tool which solves many of the problems is JMH (Java Microbenchmarking Harness). For the sake of completeness, here a small example of how such a benchmark looks like:

github:d06b263ee7e99d88db46

There is not a big difference compared to a unit test but the typical annotations. All the execution magic is done by the extra code that JMH adds due to the @Benchmark annotation. Because we want to have JMH benchmark execution being part of our development tool chain, we take a look at the jmh-gradle-plugin.

github:8e77bccceb06da0d57ad

This plugin also suggests to use a dedicated source path for all JMH classes. That would be the directory structure:

github:a883f73442ca0b78d5b0

The plugin provides different possibilities to make also the test source path accessible from the JMH source path if necessary. If everything is set up correctly, one

github:04648a52c811d1cbf0d6

will be sufficient to run the jmh gradle task. Results can then be found in the output directory ‘build/reports/jmh’. Be patient, JMH does many iterations and JVM warm-up steps by default which cause probably unexpected duration. When finished, the result could look like:

github:8ee909c88c737ebc0aea

Short note on the figures: The default JMH mode is ‘throughput’ which means, JMH will call your benchmark method as often as possible within one second per iteration. JMH executes 20 independent iterations in 10 different JVM forks, resulting in 200 samples that are used to calculate average score and mean error. Finally the result of out benchmark is that the JVM reaches an average of 72,503,070.056 triangle intersection executions per second. Can we probably do better? :)

5.) Hint to the source code and steps further

Here you can find a branch in my github project which showcases the proceedings of this blog article: Chroma@github, Branch: crolabefra_starting_point. Feel free to checkout and set it up. I hope you like the colorful context of Computer Graphics :).

If you would like to learn more about performance topics on the JVM in detail, join Daniel’s sessions at JavaLand in Cologne in march!

In this blog article I gave a general outline, what I would like to achieve in the next weeks, a Historized Automated Cross Language Benchmarking approach based on Gradle and git. Target languages are currently Java and C++ which will be stressed with JMH and Hayai. Today I gave you a basic introduction into the topics how to couple Gradle, jmh and java code in order to run within one tool chain. The next article will be mainly about the C++/Hayai part and how easy it is to integrate that into a Gradle build. You will be surprised :).

Thanks to Daniel Mitterdorfer, Max Raba, and Samer Al-Hunaty for the review!

So long! Cheers!

Further articles of this series: