Improve investigation while encoutering a MathIllegalStateException

Dear all,
We have been enjoying propagation tools in Orekit for quite a while, thank you for that :smiley: !
Lately, we have been encountering some exception such as :
org.hipparchus.exception.MathIllegalStateException: maximal count (100) exceeded at org.hipparchus.util.Incrementor.lambda$static$0(Incrementor.java:41) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.util.Incrementor.increment(Incrementor.java:238) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BaseAbstractUnivariateSolver.incrementEvaluationCount(BaseAbstractUnivariateSolver.java:318) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BaseAbstractUnivariateSolver.computeObjectiveValue(BaseAbstractUnivariateSolver.java:165) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BracketingNthOrderBrentSolver.doSolveInterval(BracketingNthOrderBrentSolver.java:297) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BracketingNthOrderBrentSolver.solveInterval(BracketingNthOrderBrentSolver.java:426) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BracketedUnivariateSolver.solveInterval(BracketedUnivariateSolver.java:126) ~[hipparchus-core-1.7.jar:?] at org.orekit.propagation.events.EventState.findRoot(EventState.java:320) ~[orekit-10.2.jar:?] at org.orekit.propagation.events.EventState.evaluateStep(EventState.java:218) ~[orekit-10.2.jar:?] at org.orekit.propagation.analytical.AbstractAnalyticalPropagator.acceptStep(AbstractAnalyticalPropagator.java:308) ~[orekit-10.2.jar:?] at org.orekit.propagation.analytical.AbstractAnalyticalPropagator.propagate(AbstractAnalyticalPropagator.java:168) ~[orekit-10.2.jar:?]

The cause is tough to be understood : we suspect a bug due to discontinuities in our models. Still, that is not easy to investigate : it occurs during a long propagation, which takes a lot of time to be executed.
To analyze the problem, we would need to propagate near the date of the exception and then investigate the current state we’re in.
Do you think Orekit would gain from being more user-friendly : for example, we could catch the underlying MathRuntimeException in the propagator loop and then rethrow it but with additional information such as the date at which the exception occurs ? Such “debug” information on the problematic step of the propagation might be helpful for a user looking to investigate a bug, as well as add some quality of life information when investigating a bug in a long propagation (to avoid waiting for a long time again for a long propagation).

Thank you for everything!

Anne-Laure

Hi @Anne-Laure

If you think you can extract the mathematical data from the exception and convert it to space flight dynamics data on its way up, then sure it would be worth adding.

1 Like

Thank for the fast reply! I will have a look on this now I am sure that you agree with the user need!

Hello,

We finally found the problem in one of our detector. It would have been a big help if orekit has catched the exception in the org.orekit.propagation.events.EventState.

It is efficient to get some information from inside the loop of findRoot() method :

  private boolean findRoot(final OrekitStepInterpolator interpolator,
                                 final AbsoluteDate ta, final double ga,
                                 final AbsoluteDate tb, final double gb) {
            // check there appears to be a root in [ta, tb]
            check(ga == 0.0 || gb == 0.0 || (ga > 0.0 && gb < 0.0) || (ga < 0.0 && gb > 0.0));

        final double convergence = detector.getThreshold();
        final int maxIterationCount = detector.getMaxIterationCount();

        ... 

        // loop to skip through "fake" roots, i.e. where g(t) = g'(t) = 0.0
        // executed once if we didn't hit a special case above
        AbsoluteDate loopT = ta;
        double loopG = ga;
        while ((afterRootG == 0.0 || afterRootG > 0.0 == g0Positive) && 
               strictlyAfter(afterRootT, tb)) {
            try {
                if (loopG == 0.0) {

                ...

                check((forward && afterRootT.compareTo(beforeRootT) > 0) || (!forward && 
                    afterRootT.compareTo(beforeRootT) < 0));
                // setup next iteration
                loopT = afterRootT;
                loopG = afterRootG;
            } catch(Exception e) {
                throw new RuntimeException(String.format("%s failed to evaluate, problem occurring at %s : %s", detector, loopT, e.getMessage()), e);
            }
        }
        ...
}

This will ends with an exception trace targeting in this function followed by the trace I described above.
Nothing is lost and we have some information on what detector failed and when.

What do you think of this?
I am ready to contribute if you want :slight_smile:.
I could add a junit test to check the message content and the cause exception.

Best regards,

Anne-Laure

I am always a littler reluctant to catch the very generic Exception class, and in fact it is frowned
upon in best practices and flagged by quality checkers (don’t remember if it is checkstyle, spotbugs or both).

In this case, however, I can see the rationale as the library calls external user code that it doesn’t know about and cannot enforce a more specific exception type.

So I am on the fence, with both pros and cons. I would like to see other developers opinion about this. If we add a catch-all statement here, we sould also probably add one around user-provided step handlers.

Hello,

Thank you for this feedback. I also asked myself if I should catch MathIllegalStateException (or MathRuntimeException). Then while I was written the code, I was thinking that I would like to have these information in any case something went wrong.

I totally agree that point of view of the Orekit community could help us.

Upon that, I am ready to change my contribution to catch MathIllegalStateException ? MathRuntimeException ?

Best regards,

Anne-Laure

Hello @Anne-Laure,

Thank you for your soon to come contribution!

I’m a bit reluctant to use the generic Exception too; but maybe I’ve been brainwashed by the quality tools :wink:

I would catch the generic MathRunTimeException and re-throw it as an OrekitException.

Best regards,
Maxime

I’m in favor to adding more information to exceptions as they propagate up the stack. Often each layer can have important context to add that greatly aids in debugging, as in this case.

If you’re trying to catch an exception within the root finder it would make sense to wrap the call to solve(...) and include the time and g values of both bracketing points as I believe both are available in the code.

When making this change you’ll have to do it four times. Twice in Orekit for the analytical propagators (field & non-field) and twice in Hipparchus for the numerical propagators (field & non-field).

More information could be added by catching the exception within the root finder itself so it can include the last t and g value it evaluated.

Thanks in advance for any contributions!

Hello everyone,

That’s a long time I posted my first suggestion but I am back after few personal/work events that prevented me to work on this subject!
Thank you for your feedbacks!

If I sum up everything, we should :

  • catch MathRunTimeException only
  • indicate the time and g values of both bracketing points
  • possibly, indicate the last t and g value it evaluated

I changed my code catching the exception to fulfill these suggestions (I define a silly dectector so it fails fast) :

catch (MathRuntimeException e) {
throw new RuntimeException(
String.format("%s failed to find root between %s (g=%f) and %s (g=%f) : %s\nLast iteration at %s (g=%f)", detector, ta, ga, tb, gb, e.getMessage(), loopT,
loopG), e);
}

I have the following error in console :

java.lang.RuntimeException: java.lang.RuntimeException: some.package.SomeDetector@642a7222 failed to find root between 2030-06-18T00:10:00.000 (g=0.001000) and 2030-06-18T00:20:00.000 (g=-1.000000) : maximal count (50) exceeded
Last iteration at 2030-06-18T00:10:00.000 (g=0.001000)

at org.orekit.propagation.events.EventState.findRoot(EventState.java:380)
at org.orekit.propagation.events.EventState.evaluateStep(EventState.java:252)
at org.orekit.propagation.analytical.AbstractAnalyticalPropagator.acceptStep(AbstractAnalyticalPropagator.java:233)
at org.orekit.propagation.analytical.AbstractAnalyticalPropagator.propagate(AbstractAnalyticalPropagator.java:168)

Caused by: org.hipparchus.exception.MathIllegalStateException: maximal count (50) exceeded
at org.hipparchus.util.Incrementor.lambda$static$0(Incrementor.java:41)
at org.hipparchus.util.Incrementor.increment(Incrementor.java:238)
at org.hipparchus.analysis.solvers.BaseAbstractUnivariateSolver.incrementEvaluationCount(BaseAbstractUnivariateSolver.java:318)
at org.hipparchus.analysis.solvers.BaseAbstractUnivariateSolver.computeObjectiveValue(BaseAbstractUnivariateSolver.java:165)
at org.hipparchus.analysis.solvers.BracketingNthOrderBrentSolver.doSolveInterval(BracketingNthOrderBrentSolver.java:297)
at org.hipparchus.analysis.solvers.BracketingNthOrderBrentSolver.solveInterval(BracketingNthOrderBrentSolver.java:426)
at org.hipparchus.analysis.solvers.BracketedUnivariateSolver.solveInterval(BracketedUnivariateSolver.java:126)
at org.orekit.propagation.events.EventState.findRoot(EventState.java:352)
… 44 more

I have defined a unit test in EventDetectorTest :

@Test
public void testMathRuntimeException() {
    try {
        // initial conditions
        Frame eme2000 = FramesFactory.getEME2000();
        TimeScale utc = TimeScalesFactory.getUTC();
        final AbsoluteDate initialDate = new AbsoluteDate(2011, 5, 11, utc);
        EquinoctialOrbit initialOrbit = new EquinoctialOrbit(new PVCoordinates(new Vector3D(4008462.4706055815, -3155502.5373837613, -5044275.9880020910),
                                                                               new Vector3D(-5012.9298276860990, 1920.3567095973078, -5172.7403501801580)), eme2000,
                                                             initialDate, Constants.WGS84_EARTH_MU);
        KeplerianPropagator k = new KeplerianPropagator(initialOrbit);
        k.addEventDetector(new AbstractDetector(100, 1e-6, 5, new ContinueOnEvent()) {
            double lastValue = -1;
            @Override
            public double g(final SpacecraftState s) {
                lastValue = lastValue * -1;
                return lastValue;
            }

            @Override
            protected AbstractDetector create(double newMaxCheck, double newThreshold, int newMaxIter, EventHandler newHandler) {
                throw new UnsupportedOperationException("this class to be used locally");
            }
        });
        k.propagate(initialDate.shiftedBy(initialOrbit.getKeplerianPeriod()));
        Assert.fail("an exception should have been thrown");
    }
    catch (RuntimeException oe) {
        Assert.assertSame(MathIllegalStateException.class, oe.getCause().getClass());
    }
}

… do you think it is useful ?

@evan.ward I did not understand why I should put this code in 4 different places and I have never contribute to hipparchus.
Could you give some more information ? Or maybe, I could push my branch with my modifications so you could copy/paste where it should be done ?

Best regards,

Anne-Laure

I like it.

Orekit uses the OrekitMessages class for the format string so that the text can be translated.

I would also tend towards more precision in the reported values as I’ve seen some issues that differ by only a very small amount. I can do this after you push if you would like. The MessageFormat syntax is not intuitive.

And perhaps it would be worthwhile to add that information to any exception that occurs during event finding, so catch RuntimeException instead of MathRuntimeException. What do you think?

Yes, you can push it and I will copy it. It is one of the maintenance headaches with Orekit.

Hello,

I did not know about the OrekitMessages class, thanks to your proposition of doing it :slight_smile: !

About the exception, totally agreed with catching RuntimeException and this was my first thought but I changed it due to comments in this conversation (see above).

And thanks again to copy that elsewhere in source code.

I will have a look at your merge so I will know a little bit more about Orekit :wink:.

For information, @yannick will push in my name.

Best regards,
Anne-Laure

Ah yes, I had forgotten about Luc comment. Another option is to use the addSuppressed(...) method that was added in Java 7. The the original exception could continue and you could add some information to it. Perhaps not the original intention, but it would work. E.g.

} catch (RuntimeException e) {
            e.addSuppressed(new OrekitException(...));
            throw e;
}

Thanks Evan!

I did not know about the suppressed exceptions and I search more infos on the web.

I understand that it might be used in case of parallel or double catch exception as seen in https://www.baeldung.com/java-suppressed-exceptions. Doesn’t it ?

The behavior you describe (having all exceptions stack trace) is the one I developed (I have set the original exception as input parameter of the new exception).

I hope that’s fine.

Hello everyone,

I have opened a Gitlab issue for this new feature.

@Anne-Laure your work has been pushed to the issue-797 branch. I have not merged it with develop yet, as I understand @evan.ward would like to review it beforehand, and maybe copy the code to other places as discussed in this comment above.

If I can do any more to help, please let me know.

Have a nice day
Yannick

Hello,

Thanks @yannick !

I have been warned that a test failed (that’s great to be in a short loop :grinning_face_with_smiling_eyes: ).
I will push the fix tomorrow.

Best regards.

Hello,

@evan.ward While fixing the pipeline error, I found about the localized message.
I will push with the use of OrekitMessages.

Best regards,

Anne-Laure

Hi all,

I have pushed the changes to the issue-797 branch of the Orekit repository.

Regards
Yannick

Hello @evan.ward ,

The pipeline is red again but this time because of missing translated message.

I could provide french :

# {0} failed to find root between {1} (g={2}) and {3} (g={4}) : {5}\nLast iteration at {6} (g={7})
FIND_ROOT={0} n'a pas réussi à trouver une solution entre {1} (g={2}) et {3} (g={4}) : {5}\nLa dernière itération était à {6} (g={7})

and spanish :

# {0} failed to find root between {1} (g={2}) and {3} (g={4}) : {5}\nLast iteration at {6} (g={7})
FIND_ROOT={0} ha fallado buscando una solución entre {1} (g={2}) y {3} (g={4}) : {5}\nÚltima iteración en {6} (g={7})

I do not know how you handle all the langages ?

This branch is in your hands ? Or do you think I could do more ?

Best regards.

Thanks! I’ll take a look.

I just put in <MISSING TRANSLATION> and let someone who knows the language fill it in later. Before a release Luc usually asks the community to update translations.

Anne-Laure, the next time, you can create your own fork of the Orekit project on Gitlab, push your code in a dedicated branch in your fork and, then, create a merge request to the develop branch of Orekit project. We can give you an access to the Gitlab CI runner executor or, at your convenience, you could enroll your own executor.

Do not hesitate to ask me for more detailed information in private if necessary.