Improve investigation while encoutering a MathIllegalStateException

Dear all,
We have been enjoying propagation tools in Orekit for quite a while, thank you for that :smiley: !
Lately, we have been encountering some exception such as :
org.hipparchus.exception.MathIllegalStateException: maximal count (100) exceeded at org.hipparchus.util.Incrementor.lambda$static$0(Incrementor.java:41) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.util.Incrementor.increment(Incrementor.java:238) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BaseAbstractUnivariateSolver.incrementEvaluationCount(BaseAbstractUnivariateSolver.java:318) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BaseAbstractUnivariateSolver.computeObjectiveValue(BaseAbstractUnivariateSolver.java:165) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BracketingNthOrderBrentSolver.doSolveInterval(BracketingNthOrderBrentSolver.java:297) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BracketingNthOrderBrentSolver.solveInterval(BracketingNthOrderBrentSolver.java:426) ~[hipparchus-core-1.7.jar:?] at org.hipparchus.analysis.solvers.BracketedUnivariateSolver.solveInterval(BracketedUnivariateSolver.java:126) ~[hipparchus-core-1.7.jar:?] at org.orekit.propagation.events.EventState.findRoot(EventState.java:320) ~[orekit-10.2.jar:?] at org.orekit.propagation.events.EventState.evaluateStep(EventState.java:218) ~[orekit-10.2.jar:?] at org.orekit.propagation.analytical.AbstractAnalyticalPropagator.acceptStep(AbstractAnalyticalPropagator.java:308) ~[orekit-10.2.jar:?] at org.orekit.propagation.analytical.AbstractAnalyticalPropagator.propagate(AbstractAnalyticalPropagator.java:168) ~[orekit-10.2.jar:?]

The cause is tough to be understood : we suspect a bug due to discontinuities in our models. Still, that is not easy to investigate : it occurs during a long propagation, which takes a lot of time to be executed.
To analyze the problem, we would need to propagate near the date of the exception and then investigate the current state we’re in.
Do you think Orekit would gain from being more user-friendly : for example, we could catch the underlying MathRuntimeException in the propagator loop and then rethrow it but with additional information such as the date at which the exception occurs ? Such “debug” information on the problematic step of the propagation might be helpful for a user looking to investigate a bug, as well as add some quality of life information when investigating a bug in a long propagation (to avoid waiting for a long time again for a long propagation).

Thank you for everything!

Anne-Laure

Hi @Anne-Laure

If you think you can extract the mathematical data from the exception and convert it to space flight dynamics data on its way up, then sure it would be worth adding.

1 Like

Thank for the fast reply! I will have a look on this now I am sure that you agree with the user need!

Hello,

We finally found the problem in one of our detector. It would have been a big help if orekit has catched the exception in the org.orekit.propagation.events.EventState.

It is efficient to get some information from inside the loop of findRoot() method :

  private boolean findRoot(final OrekitStepInterpolator interpolator,
                                 final AbsoluteDate ta, final double ga,
                                 final AbsoluteDate tb, final double gb) {
            // check there appears to be a root in [ta, tb]
            check(ga == 0.0 || gb == 0.0 || (ga > 0.0 && gb < 0.0) || (ga < 0.0 && gb > 0.0));

        final double convergence = detector.getThreshold();
        final int maxIterationCount = detector.getMaxIterationCount();

        ... 

        // loop to skip through "fake" roots, i.e. where g(t) = g'(t) = 0.0
        // executed once if we didn't hit a special case above
        AbsoluteDate loopT = ta;
        double loopG = ga;
        while ((afterRootG == 0.0 || afterRootG > 0.0 == g0Positive) && 
               strictlyAfter(afterRootT, tb)) {
            try {
                if (loopG == 0.0) {

                ...

                check((forward && afterRootT.compareTo(beforeRootT) > 0) || (!forward && 
                    afterRootT.compareTo(beforeRootT) < 0));
                // setup next iteration
                loopT = afterRootT;
                loopG = afterRootG;
            } catch(Exception e) {
                throw new RuntimeException(String.format("%s failed to evaluate, problem occurring at %s : %s", detector, loopT, e.getMessage()), e);
            }
        }
        ...
}

This will ends with an exception trace targeting in this function followed by the trace I described above.
Nothing is lost and we have some information on what detector failed and when.

What do you think of this?
I am ready to contribute if you want :slight_smile:.
I could add a junit test to check the message content and the cause exception.

Best regards,

Anne-Laure

I am always a littler reluctant to catch the very generic Exception class, and in fact it is frowned
upon in best practices and flagged by quality checkers (don’t remember if it is checkstyle, spotbugs or both).

In this case, however, I can see the rationale as the library calls external user code that it doesn’t know about and cannot enforce a more specific exception type.

So I am on the fence, with both pros and cons. I would like to see other developers opinion about this. If we add a catch-all statement here, we sould also probably add one around user-provided step handlers.

Hello,

Thank you for this feedback. I also asked myself if I should catch MathIllegalStateException (or MathRuntimeException). Then while I was written the code, I was thinking that I would like to have these information in any case something went wrong.

I totally agree that point of view of the Orekit community could help us.

Upon that, I am ready to change my contribution to catch MathIllegalStateException ? MathRuntimeException ?

Best regards,

Anne-Laure

Hello @Anne-Laure,

Thank you for your soon to come contribution!

I’m a bit reluctant to use the generic Exception too; but maybe I’ve been brainwashed by the quality tools :wink:

I would catch the generic MathRunTimeException and re-throw it as an OrekitException.

Best regards,
Maxime

I’m in favor to adding more information to exceptions as they propagate up the stack. Often each layer can have important context to add that greatly aids in debugging, as in this case.

If you’re trying to catch an exception within the root finder it would make sense to wrap the call to solve(...) and include the time and g values of both bracketing points as I believe both are available in the code.

When making this change you’ll have to do it four times. Twice in Orekit for the analytical propagators (field & non-field) and twice in Hipparchus for the numerical propagators (field & non-field).

More information could be added by catching the exception within the root finder itself so it can include the last t and g value it evaluated.

Thanks in advance for any contributions!