Urgent patch release upcoming

luc · June 15, 2021, 3:07pm

A critical bug has been discovered recently in the ellipsoid tesselation feature (see issue 792).

This bug may generate an infinite loop in very rare cases due to numerical noises when computing the difference of two very close lines in geographical zones of interest. The bug was discovered by a operational project using a very old Orekit version (7.x) and Apache Commons Math. It was not possible to reproduce this specific case with current version, but looking at the code, it is nevertheless possible that it happens again.

As the result of the bug is an infinite loop, it is considered critical, therefore I will fix this by adding a simple hard-coded iteration limit. The loop should generally converge in one iteration for path-connected zones and n iterations for zones composed of n path-connected zones, like for example an arpichelago where each island would be modeled, so setting a limit to something like 1000 iterations just as a safety feature seems sufficient.

Once the fix is available (most probably tomorrow), I will publish the patch release, according to section 2.2.4 in Orekit governance that specify the process for urgent patch releases. This urgent process occurs mainly on the developers lists which takes responsibility for the release. I have already notified the PMC who could ask for a more formal vote in the next 24 hours.

So this message is intended mainly to notify developers. The only change between Orekit 10.3 and Orekit 10.3.1 will be this fix.

What do developers think?

bcazabonne · June 15, 2021, 4:32pm

+1
Infinite loop is a good argument for a patch version

I also agree with the limit of 1000 iterations. It is a good compromise between fast execution and highlighting of unsolvable problem.

Bryan

evan.ward · June 16, 2021, 11:53am

Sounds like a reasonable fix to me.

Is it related to [1]? Which is also an issue for nearly parallel intersecting lines.

[1] Infinite Boundary in PolygonSet · Issue #49 · Hipparchus-Math/hipparchus · GitHub

luc · June 16, 2021, 1:16pm

I was not able to check precisely the behavior and have no access to the real data anymore, due to confidentiality reasons, but it seems similar: when computing a difference
factory.difference(remaining, mesh.getCoverage()) the resulting zone was not empty and had one spurious peak extended a few kilometers aways from the initial zone, despite the mesh was really covering the full zone. So the loop considered the mesh did not fully cover the zone and iterated once more, but in fact found the same mesh again, hence the infinite loop.

luc · June 16, 2021, 2:50pm

I have pushed the fix on the release-10.3 branch, according to our gitflow management.
The branch is tagged 10.3.1-RC1.

Could one or two developers check it and send me quickly a green light so I publish it this evening?

Note that it is normal that one test in CR3BPSystemTest fails. This is due to issue 744 that has been fixed in the develop branch.
I don’t think it would be a good idea to add this fix here too, as issue 744 is not critical. This means that except if you change your computer date to some random date in 2020, you will see one test failure.

luc · June 16, 2021, 3:03pm

Well, it is a nightmare to have the artifacts generated properly, and the continuous integration push them automatically, and to have the site with proper tests. So I will just put an @ignore temporarily on this test for 10.3.1. I’ll push a new commit and an RC2 tag in the upcoming minutes.

luc · June 16, 2021, 3:10pm

Done, RC2 pushed, continuous integration running.

bcazabonne · June 16, 2021, 6:18pm

I don’t have my computer because I’m at home. I verified the changes in the commits using Gitlab and my mobile phone. Everything look good including CI. The different files (e.g., pom, build, changes, etc.) were correctly updated.

That was a good idea to add the @Ignore on the test. Maven doesn’t like failing tests for deploying artifacts

You have my green light