JMH Performance Benchmarks: Math vs. StrictMath vs. FastMath (Hipparchus) vs. FastMath (Apache)

Ryan · May 22, 2025, 6:43pm

Hello!

I’ve spent some time recently benchmarking the aforementioned (|Strict|Fast)Math classes after a coworker found our use of FastMath::sin in a particular spot was generating a huge number short-lived objects.

==================

TL;DR: in many cases, Math is faster than FastMath, and it seems unlikely that either OS or hardware makes a difference. Versions of Java different than 17 were not tested!

So, I’m wondering whether FastMath offers better numerical stability/guarantees, or whether there are other reasons to use FastMath in cases where Math is much faster.

==================

Note that I did not test all methods in these classes; I tested those that are common to each - sinCos being the sole exception (explained towards the end of this post) - intersected with those that are most commonly used in my experience.

Based on this analysis - if speed is the goal - the gist is that Math is the way to go in most cases, but Hipparchus’ FastMath should be used for the inverse and hyperbolic trig. functions (and a few others):

FastMath faster:
- asin, acos, atan, atan2
- sinh, cosh, tanh
- expm1
- min, max, signum
Math faster:
- sinCos, sin, cos, tan
- log1p, log10, log, exp
- hypot
- cbrt
- pow
- floor, ceil, abs

Linked below is the report. The benchmarking code using JMH can be found here.

This project - Ptolemaeus - is actually a FOSS Java mathematics project that makes used of Hipparchus. It started as a “for fun” project before being incorporated into repositories at work where, after a year or two, I had it approved for FOSS release.

There are a few changes I’d still like to make before truly introducing the community to what it’s got to offer - I’d wait until v5 is released before really digging in - but the project is used in production on several high-impact programs.

Math_Benchmarks_Java17_2025-05_v3.1.0.xlsx (788.8 KB)

==================

Full Procedure

In an effort to minimize human error and to make this as portable and backwards compatible as possible:

Two Writer classes - SingleArgMathBenchmarkWriter and TwoArgMathBenchmarkWriter - were used to generate the benchmarking classes, each of which extends an abstract benchmarking class; one of SingleArgMathBenchmark and TwoArgMathBenchmark.

These JMH benchmarks are run using Gradle jmh plugin extensions:

jmhMath
jmhStrictMath
jmhHipparchusFastMath
jmhApacheFastMath

The configuration “inherited” by each task is

jmh {
    resultFormat    = 'CSV'    // Result format type (one of CSV, JSON, NONE, SCSV, TEXT)
    warmup          = '512ms'  // these arga are powers of two because they're convenient
    timeOnIteration = '2048ms'
    iterations      = 8
    fork            = 4
    threads         = Math.max(1, Runtime.getRuntime().availableProcessors() - 1)
}

To run them back-to-back, one must use more than one ./gradlew command:

./gradlew jmhMath && ./gradlew jmhStrictMath && ./gradlew jmhHipparchusFastMath && ./gradlew jmhApacheFastMath

The GitLab Runner run was performed by adding a new job to the .gitlab-ci.yml

The JMH CSV results are written to [...]\ptolemaeus\ptolemaeus-math\build\reports\jmh with names as expected: math.csv, strictMath.csv, hipparchusFastMath.csv, and apacheFastMath.csv

Once results are written, the program JMHResultsReformatter is used to reformat the results into a form more conducive to creating the Excel spreadsheets in a way that avoids human error. The first step is to write reformatted copies of each file. The second (and final) step combines the reformatted CSVs into a single combined_reformatted.csv file. The data from the combined file is copied into an Excel spreadsheet and sorted, etc.

JMHResultsReformatter takes as arguments the locations of the files to reformat and combine; e.g.,

[...]\java_math_benchmarks_2025-05\RM_home\math.csv
[...]\java_math_benchmarks_2025-05\RM_home\strictMath.csv
[...]\java_math_benchmarks_2025-05\RM_home\hipparchusFastMath.csv
[...]\java_math_benchmarks_2025-05\RM_home\apacheFastMath.csv

Before running the benchmarks, machines were restarted and nothing aside from the CLI was opened upon starting. Once started, machines were left entirely undisturbed until finished.

The machines used in benchmarking were my - referred to as “RM” - home and work machines, the work machine of a coworker of mine - referred to as “CAD” - and a GitLab Runner.

RM Work Machine Specs (DxDiag):

   Operating System: Windows 11 Enterprise 64-bit (10.0, Build 22631) (22621.ni_release.220506-1250)
       System Model: HP ZBook Fury 15.6 inch G8 Mobile Workstation PC
               BIOS: T95 Ver. 01.20.00 (type: UEFI)
          Processor: 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz (16 CPUs), ~2.6GHz
Available OS Memory: 32480MB RAM

RM Work JDKs:

OpenJDK 17.0.2
OracleJDK 17.0.12

RM Home Machine Specs (DxDiag):

   Operating System: Windows 11 Home 64-bit (10.0, Build 26100) (26100.ge_release.240331-1435)
       System Model: HP ENVY Laptop 16-h1xxx
               BIOS: F.22 (type: UEFI)
          Processor: 13th Gen Intel(R) Core(TM) i9-13900H (20 CPUs), ~2.6GHz
Available OS Memory: 16078MB RAM

RM Home JDKs:

OpenJDK 17.0.2
OracleJDK 17.0.12

CAD Work Machine Specs (DxDiag) (evidently, nearly the same):

   Operating System: Windows 11 Enterprise 64-bit (10.0, Build 22631) (22621.ni_release.220506-1250)
       System Model: HP ZBook Fury 15.6 inch G8 Mobile Workstation PC
               BIOS: T95 Ver. 01.20.00 (type: UEFI)
          Processor: 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz (16 CPUs), ~2.6GHz
Available OS Memory: 32432MB RAM

CAD Work JDK:

OpenJDK 17.0.2

GitLab Runner Specs:

Definitely a Linux machine
I’m still trying to figure this out

GitLab Runner JDK:

OpenJDK 17.0.1

==================

Important note regarding the y-axis values in the charts:

The “Benchmark” column shows the functions being benchmarked, but the benchmarks aren’t calling those methods just once; rather, the single-arg. functions are called 1_000 times per benchmark iteration, and the two-arg. functions are called 1_024 (== 32 * 32) times per benchmark iteration with random values chosen from intervals over which the functions are non-NaN.

E.g., the asin benchmark calls the asin methods using 1_000 values chosen randomly from [-1, 1].
The atan2 benchmark calls the atan2 methods using 1_024 (== 32 * 32) pairs of values chosen from [-10, 10]x[-10, 10].

So, the “score” is ~1000x smaller than the actual count of function calls. This does not impact the interpretation of results outside of comparing one-arg. function performance to two-arg. function performance.

General notes:

Hipparchus FastMath v3.1
Apache FastMath v3.6.1
All Excel data originates from JMH CSV data, and all of that data is available (ask me - @Ryan - for it and I’ll send it out).
- There is a Java script to reformat the data in a way that made it easier to paste into Excel. When run, it first creates a reformatted version of each, and then also writes a combined file. The combined file is what’s pasted into Excel.
All trends mentioned below are approx. stable across each machine, OS, and JDK tested
- There are some differences in proportion
- By “trends” I mean - essentially - the shape of the clusters of columns for each function; e.g.:

Accounting for different orderings, these are approximately equivalent. Maybe it’s worth studying abs further, but the uncertainties are large enough that I think it’s in the noise.

Invariants/Trends:

Math vs StrictMath
- For the slowest and fastest functions, Math and StrictMath are neck-and-neck, with Math slightly ahead in general
- Math is always at least as fast as StrictMath (within error bars)
- Methods where Math far outperforms StrictMath are
  - sin, cos, tan, pow, log10, log, exp, ceil, floor
Hipparchus FastMath vs. Apache FastMath
- These are neck-and-neck in almost all cases, the exceptions being the combined sinCos benchmark and round, where Hipparchus outperforms Apache
Math vs. Hipparchus FastMath
- In general, FastMath is only faster than Math for the slowest functions, as well as min and max, strangely enough.
- Math approx. equal FastMath
  - round
  - sqrt
    - sqrt is easily the most consistent among all functions benchmarked. All four classes tie - within uncertainty - in each case.
- FastMath faster:
  - asin
  - acos
  - atan
  - atan2
  - sinh
  - cosh
  - tanh
  - expm1
  - min
  - max
  - signum
- Math faster:
  - log1p
  - sinCos (I elaborate below)
  - pow
  - sin
  - cos
  - tan
  - cbrt
  - log10
  - hypot
  - log
  - exp
  - floor
  - ceil
  - abs

This screenshot is representative of results overall:

Specific cases of note:

sinCos
- Only Hipparchus’ FastMath has sinCos
- To compare, those without simply compute sin and cos separately

luc · May 22, 2025, 7:30pm

Hi @Ryan
Thanks for this very interesting study.
FastMath was implemented a very long time ago when Java was really much slower than it is now. I guess it was at Java 5 or Java 6 times. It was a contribution from the same guy who wrote Dfp. In fact, he wrote Dfp to validate FastMath.

There were three goals at start:

have fast computation (hence the name)
have more reproducible computation across platforms
ensure 0.5 ulp accuracy for almost all functions throughout domain

A fourth goal was added later on:

back-port some functions from later Java version (cbrt, hyperbolic trig…)

Note that all functions except sqrt have been reimplemented. In fact, FastMath.sqrt delegates to Math.sqrt.

Since then, a lot of things have happened in the Java world; on the other hand our FastMath implementation did not evolve much, except for the back-porting of several functions and the addition of sinCos and a few extra signatures like power to int and long. I am therefore not surprised that by now Math outperforms FastMath (and especially for pow, which is a nightmare with its 17 special cases).

It would probably be difficult for users to know which implementation to select on a function by function basis, so I think we should still have a complete replacement of Math by FastMath. However, as your study show we are now lagging behind, perhaps we should just delegate to Math for more functions.

Ryan · May 22, 2025, 7:55pm

Ah, thank for this! I’ve been wondering about the history of the FastMath class.

I agree that it would be a problem to keep track of which class to use when, so the full drop-in replacement makes perfect sense.

So, FWIW, you’ll have at least one user in support of FastMath delegating to Math more often!

Does that strike you as a likely change? Or might there be significant push-back from the community?

evan.ward · May 23, 2025, 12:28pm

Very interesting. That took significant effort to do that analysis. One thing to watch out for is optimizing for special cases. I remember for a while pow(x, 2) was much faster in Math because it had a special case for it, but pow(x, y) for non-integer y was faster in FastMath. And FastMath’s advantage was that it used larger tables than Math. I.e. it trade more memory for faster computations.

Ryan · May 23, 2025, 3:21pm

Yeah, it took some work, but I wanted to make sure I did it properly, ya know?

Regarding special-cases: indeed! The CodyWaite objects that FastMath::sin creates are what started this, and that’s only for x > pi / 2:

(in the case of sin, it turns out not to make a difference - Math::sin is / seems to be always faster)

I do expect the benchmarks here have sufficiently avoided special-cases, but of course that means that this is necessarily not capturing any special-case advantages!

Fortunately, adding new benchmarks that make use of this “framework” is ezpz

luc · May 23, 2025, 3:41pm

I think it would be a welcome change. To be honest, we really have a hard time maintaining FastMath (despite I love all this kind of stuff and am a devotee of Jean-Michel Muller’s team work: Elementatry Functions: Algorithms and Implementation and Handbook of Floating-Point Arithmetic); but clearly, we don’t have anymore the skills and the manpower available to maintain this.

Ryan · May 27, 2025, 12:45pm

Thank you for the references! Just now picked up the eBooks: I love this stuff, too.

I’m going to see what happens with the Hipparchus and Orekit unit tests if FastMath defers to Math where appropriate (according to these results).

Hopefully it’s nice an clean!

Ryan · May 27, 2025, 8:45pm

O.k., I’ve got a draft PR pointed at my own fork of Hipparchus (for now) here. Not yet for review, just so I have something to reference here.

I’m much more familiar with GitLab, so apologies in advance if I’m missing something.

Three FastMath tests changed s.t. they needed an increased the tolerance. The tests for:

log1p,
hypot, and
cbrt

I did some testing in MATLAB using std. floating point numerics, but also symbolic mathematics (f.p. numbers are converted to exact fractions using the built-in function sym), and it looks like there is a seriously small amount of degradation in the results of these functions (speculation: not manually carrying around extra bits of precision?), but whether it’s concerning, I’m unsure.

Considering this is functionality at the very foundations of a lot, I imagine this requires some thought/discussion?

luc · May 28, 2025, 3:29pm

Sure, what is the amount of tolerance increase?

Ryan · May 28, 2025, 3:45pm

hypot
- 1e-15 -> 4e-15 absolute tolerance
- in this case, we’re testing hypot against sqrt(x * x + y * y), so the increase doesn’t strike me as necessarily problematic
log1p
- was 0.51 ULPs (from MAX_ERROR_ULP)
- now 0.74 ULPs (new local variable)
- Dfp presumed truth
cbrt
- was 0.51 ULPs (from MAX_ERROR_ULP)
- now 0.67 ULPs (new local variable)
- Dfp presumed truth

luc · May 29, 2025, 11:51am

These increases seems OK to me.

Serrof · May 30, 2025, 7:06pm

Hi Ryan,

Thanks for looking into this.
Have you had time to look at Orekit tests (sorry if I missed this info)?

Cheers,
Romain

Ryan · May 30, 2025, 7:43pm

Hey! Yes, I have.

I don’t have a branch out yet (turns out I didn’t fork Orekit, I just cloned directly, so I have to fork, re-clone, and push my branch to the fork etc. etc.)

Anyway, the impacts to test results are in-line with what we see in Hipparchus: very small perturbations.

There’s one exception in an IodGooding test, but the code comments in that test make me think it’s historically been found to be very sensitive.

Even then, the change isn’t huge, just larger than I’d guess.

I’ll reply to this thread once I get the MR up in GitLab!

Serrof · May 30, 2025, 8:45pm

Sweet.
How many tests did you have to update - just to get the feeling?

Ryan · May 30, 2025, 10:08pm

TL;DR: about 20 test files changed

Ended up working today, so I took care of this and got a draft MR up:

FWIW, for almost all tests I didn’t actually need to bump the least sig.-fig., but I didn’t want to add more sig.-figs. There are a few where I hadn’t yet adopted that policy that I’ll change before this merges.

Also of note: I was having some compilation issues - one due to generics (refactored to make it go away) and the other due to Eclipse not finding a plug-in (just commented the test out) - but these should be inconsequential.

So, Ignore FieldBooleanDetector and DefaultDataContextPluginTest, at least for now.

Ryan · June 10, 2025, 5:16pm

Hey @luc ,

I think the Hipparchus PR is ready to go (updated the changes.xml files).

That said, I’m having trouble with the GitHub aspect of this. Is there not an automated build analogue for a PR (i.e., like a pipeline for an MR in GitLab)?

Serrof · June 11, 2025, 11:32am

Hi Ryan,

Regarding releases of both Hipparchus and Orekit, a question we need to ask ourselves is: are these changes negligible enough numerically speaking for a minor version or do we need to wait for a major one?

Cheers,
Romain.

Ryan · June 12, 2025, 8:23pm

Agreed; I definitely worry about an influx/inundation of comments about results being different.

I think waiting for a major version makes sense (despite my own eagerness).

Serrof · June 13, 2025, 2:22pm

Personally I think it could go for a minor realease too, but let me tag a few people so we have other opinions @orekit_dev_team @axelclk

axelclk · June 17, 2025, 5:55pm

+1

I have no problems with a minor version update