Proposal for step handling overhaul

luc · July 7, 2021, 10:48am

Hi all,

For years, I have been frustrated by some of our early design choices and I would like to discuss them.
There are two different aspects, but they seem really interleaved to me, so I will put both in the same discussion.

Steps and events scheduling

context

The first aspect is the scheduling between calls to OrekitStepHandler.handleStep and calls to EventDetector.eventOccurred. When an event occurs during a step, it may truncate the step, either because it changes state, derivatives or even stop propagation. So step k which before events detection would have been expected to cover time range [tₖ₋₁ ; tₖ] will really cover time range [tₖ₋₁ ; tₑ] where tₑ is the time of event occurrence: truncation changes the range of the step interpolator that handleStep method will receive as its first argument. If event occurrence stops propagation this will also change the value of the isLast boolean that handleStep method will receive as its second argument.

problem statement

The current way we handle this is that eventOccured is called first, and handleStep is called afterwards. This implies that with previous notations, user callbacks will experience call with a fixed time set to tₑ first (in the event handler) and with a time range set to [tₖ₋₁ ; tₑ] next (in the step handler). This is counter-intuitive as we seem to go backward in time handling the end of the step before the step itself. This has lead to several problems in the past, mainly when user code combines the output of both functions in some log file, or when it keeps some internal state between calls. Users currently need to add difficult logic in their code to compensate for our current design.

proposed solution

I would like to reverse these calls to get something more intuitive for users and allow more straightforward code.

A snippet of the Hipparchus AbstractIntegrator.acceptStep method, which has the same structure as Orekit AbstractAnalyticalPropagator.acceptStep (except Hipparchus manages a list of step handlers whereas Orekit handles only one) is shown below:

    eventLoop:
    while (!occurringEvents.isEmpty()) {

        // handle the chronologically first event
        final EventState currentEvent = occurringEvents.poll();

        // get state at event time
        ODEStateAndDerivative eventState = restricted.getInterpolatedState(currentEvent.getEventTime());

        // restrict the interpolator to the first part of the step, up to the event
        restricted = restricted.restrictStep(previousState, eventState);

        // try to advance all event states to current time
        for (final EventState state : eventsStates) {
            if (state != currentEvent && state.tryAdvance(eventState, interpolator)) {
                // we need to handle another event first
                // remove event we just updated to prevent heap corruption
                occurringEvents.remove(state);
                // add it back to update its position in the heap
                occurringEvents.add(state);
                // re-queue the event we were processing
                occurringEvents.add(currentEvent);
                continue eventLoop;
            }
        }
        // all event detectors agree we can advance to the current event time

        final EventOccurrence occurrence = currentEvent.doEvent(eventState);
        final Action action = occurrence.getAction();
        isLastStep = action == Action.STOP;

        if (isLastStep) {
            // ensure the event is after the root if it is returned STOP
            // this lets the user integrate to a STOP event and then restart
            // integration from the same time.
            eventState = interpolator.getInterpolatedState(occurrence.getStopTime());
             restricted = interpolator.restrictStep(previousState, eventState);
         }

        // handle the first part of the step, up to the event
        for (final ODEStepHandler handler : stepHandlers) {
            handler.handleStep(restricted, isLastStep);
        }

The eventOccurred method is called when currentEvent.doEvent is called and the handleStep method is called at the end of this snippet.

This shows that except for the isLastStep case, time range truncation is already handled before doEvent (and therefore eventOccurred) is called. Moving the call to handleStep just the line before the call to doEvent would ensure handleStep already sees the correct time range and is called before eventOccurred is called: user callbacks would be back in logical order, according to forward or backward propagation direction.

When an event stops the propagation, though, we miss the last call to restrictStep. If I understand this snippet correctly, this feature is intended to handle numerical glitches due to convergence threshold in event detection: we need to make sure we are after event time so we ensure that starting a new propagation after this one ends will not detect the stop condition again an in fact the loop of successive propagations being stuck and not being able to go past one event.

From a scheduling point of view, I would think this should be handled with three calls to user callbacks: the first call to handleStep with truncated step, then the call the eventOccurred, then a new call to handleStep with an almost zero step size to go past the event. This call could be put in the if (isLastStep) section.

Alongside with this proposed change, I wonder if the isLast second parameter to handleStep is still relevant there. I now thinkg that rather than passing this boolean in the regular step handling method, it would be wiser to have a separate method finish(finalState) method that would be the counterpart of the init method.

consequences for users

The two consequences for users would be the following.

calls to callback would now be scheduled in way consistent with propagation direction
handleStep would sometimes be called with very small or even 0 step size

The first consequence is the goal of the change. The second consequence is probably not really a problem and it already can occur as we can have several events occurring at the same time so intermediate steps are zero. We may add some comments in javadoc so users are aware that this can happen, in case they use things like division by step size to compute some rates.

Of course, if this change is accepted, it should be done in both Orekit and Hipparchus.

merging propagation modes

context

The second aspect concerns propagation modes.
Since a number of years (I tracked it back to 13 years ago in the Propagator interface, but it was probably earlier in other classes), we have three propagation modes:

master mode, where the propagator manages the time loop and user code is called via callbacks in step handlers
slave mode, where propagator just advances to target time and return to caller, who manages the time loop and usually uses the returned state
ephemeris generation mode, where propagator advances to target time and stores an ephemeris that use can retrieve and analyse a posteriori

problem statement

There are several problems with this approach.

The first one is philosophicalor social and relies on the connotation of master/slave terminology which is nowadays unsustainable. Many other technology-oriented project have made move to abandon these terms and change to more neutral terms. A number of replacements are proposed in wikipedia Master/slave (technology) page, but I don’t see they are well suited to our usage. I have seen other suggestions like leader/follower or controller/responder that are more to the point of our usage.

The second problem is that these modes are inconsistent. Master and slave mode are exclusive to each other, but ephemeris generation mode is not really a separate mode, and can be used with both previous modes, depending on how it is set up.

The third problem is that sometimes, we would like to have several different step handlers at the same time. This is allowed at low level for Hipparchus ODE integrators, but not at Orekit level. We do have a OrekitStepHandlerMultiplexer that was added ten years ago, but it does not handle directly fixed step handlers, which are useful for regular logging for example.

The fourth problem is that it is somewhat confusing for new users.

Sometimes we also alreay mix things, for example by calling a propagator several times in row (hence having an outer loop that is slave mode alike) and using master mode for each propagation leg. It’s not really a problem, it is just something I find strange.

proposed solution

When looking at the implementation, there are in fact no major distinction between all these modes. Propagators always run in a single mode, which is the current master mode, and the other modes are simply related to special step handlers. Slave mode is obtained by just setting the step handler to null so no callback is ever triggered. Ephemeris generation mode is obtained by just wrapping the propagator itself for analytical propagators and by registering an additional step handler at Hipparchus level for integration-based paropagators (i.e. DSST and numerical propagator).

So I would suggest to just drop the concept of propagation modes and just let users register or not step handlers or not, as many as they wish, with fixed or variable step sizes.

For ephemeris, we could either let the setEphemerisGenerationMode convenience method (perhaps renaming it generateEphemerisOnTheFly or something similar) and the associated getGeneratedEphemeris method or remove both and set up a dedicated EphemerisStoringStepHandler. I would personnaly prefer the later.

It also helps debugging code or adding optional output with dedicated step handlers. We can also have both a fine-grained fixed step log file with huge output set to a file and a more coarse-grained fixed step output intended for user display so they see progress for long term computation.

consequences for users

The main consequences would be

setSlaveMode renamed into clearStepHandlers
setMasterMode renamed into addStepHandler (with both fixed and variable steph signatures preserved)
setEphemerisMode renamed into generateEphemerisOnTheFly, with only one no-argument signature, or adding a EphemerisStoringStepHandler

If this change is accepted, it should be done only in Orekit as Hipparchus already supports multiple steph handlers and has no notion of mode and the equivalent of ephemeris is already managed in the public API as a dedicated step handler: DenseOutputModel

I would like to put these changes in 11.0, which should be published soon.

evan.ward · July 7, 2021, 1:12pm

Hi Luc,

Your proposal for step handling and event scheduling seems fine to me. I also like your idea of a finish(finalState) method as a counter part to the init(...) method.

If the user is logging g function of force model evaluations they will still have the going back in time effect in their logs: the force model will be evaluated at tₖ, then the g function at tₑ, then the step handler called with the interval [tₖ₋₁ ; tₑ]. I think this is unavoidable and really is working as it should.

I have received some feedback from our development team agreeing with this.

Also the setEphemerisMode(OrekitStepHandler) method that is both master mode and ephemeris mode.

Seems like a good solution to me.

Perhaps go a bit further and just provide a get/set method on Propagator for a single OrekitStepHandler as well as some static methods or a builder class that make it easy for the user to create the kind of step handler they want. That would move the complexity of managing a list of step handlers out of Propagator, which is already very complex. One possibility is to use the existing OrekitStepHandlerMultiplexer class. Maybe add a convenience method for adding fixed step handlers that takes care adding the OrekitStepNormalizer, and another for an ephemeris handler.

+1 if it still has an easy way to turn it into a Propagator. Perhaps name it StoringStepHandler since it stores the whole step handler. It’s another opportunity to move some complexity out of Propagator.

Best Regards,
Evan

bcazabonne · July 8, 2021, 6:22am

Very good idea!

+1 for this, it could be very interesting to have this method for instance in order to log some information about the final state.

We also have a OrekitFixedStepHandlerMultiplexer class. The initial motivation was to be able to generate OEM and AEM files during the same orbit propagation. The main problem is just that we cannot combine both types of step handler (i.e., fixed and non-fixed) inside a unique Multiplexer class.

That is a very good enhancement. So, do you plan to remove all XxxMultiplexer class in propagation.sampling package?

A last question, but I think I already know the answer. Do you plan to perform these changes for both Field and non-Field versions ?

Regards,
Bryan

luc · July 8, 2021, 7:52am

I’ll dot that.

Rather merge them together as suggested by Evan with some logic added.
Perhaps also a possibility to add/remove handlers on the fly, which could for example be used to log only part of the orbit, like state evolution just during a maneuver.

Don’t worry, you can tell Julie I won’t delegate this boring stuff to her, you both have far enough work to do.

pascal.parraud · July 8, 2021, 8:28am

Hi Luc,

I agree with your 2 proposals, which simplify the code and make it easier to use.

For the first one, about steps ans events scheduling, I appreciate the new finish(finalState) method, to be set with a default implementation as the init method, but I am not sure about the removal of the isLast argument in the handleStep methods that will generates backward incompatibilities and thus a lot of work in the existing codes.

For the second one, about the merging/removing of the propagation modes, I also think that extending the multiplexer to add any kind of step handlers could be the good solution.

Thanks for those very interesting proposals.

Regards,
Pascal

luc · July 8, 2021, 8:46am

I understand but prefer to remove it anyway. I find it confusing to have two ways to handle the final state, and awkward to have a parameter that is used in only one call among thousands. In many cases, users just don’t use the parameter, so it changes only the signature. In the cases isLAst is used, I have often seen it at the end of the handleStep with a if (isLast) { ... }, in this case the refactoring is mainly replacing the if by the function declaration and adding two closing curly braces to separate the functions (at least it is what I needed to do in the Hipparchus branch where I tested this). Of course, there are cases where it is slightly more complex, but I have only found one case (or two if you count the field version) when doing this on Hipparchus tests.

What to other people think about this?

evan.ward · July 8, 2021, 11:50am

It will be some work to update existing code. A brief survey of where I use step handlers reveals that most step handlers don’t use isLast. The code that is dependent on isLast is not dependent on the state, so I’m already using isLast to do what would be cleaner in a finish(...) method. So I am in favor of removing isLast as it would make the event handling code simpler, it would improve the readability of step handlers, and we are already planning to make backwards incompatible changes to step handlers.

Removing isLast may reduce capability is some cases. For example one could do something unique with the whole time span covered by the OrekitStepInterpolator on the last step. But is there actually a use case for doing this? Especially given that there are no guarantees made about the time span covered by the interpolator. With the new paradigm the user would need to store the interpolator in a field and then access it in the finish(...) method. So not a reduction in capability for an obtuse use case, just an increase in complexity. But this is certainly not a case we should design for as I don’t think it as any practical uses.

hankgrabowski · July 13, 2021, 6:42pm

I think this plus the suggestions that have been mentioned about the finalState update look good. I haven’t used the finalState/isLast aspects so don’t have an informed position in terms of whether the changes will be problematic in practice.

luc · July 15, 2021, 8:08pm

The changes have all been implemented in the develop branch, including documentation.
Could some of you review them?

The documentation is available here