Hi all,
For years, I have been frustrated by some of our early design choices and I would like to discuss them.
There are two different aspects, but they seem really interleaved to me, so I will put both in the same discussion.
Steps and events scheduling
context
The first aspect is the scheduling between calls to OrekitStepHandler.handleStep
and calls to EventDetector.eventOccurred
. When an event occurs during a step, it may truncate the step, either because it changes state, derivatives or even stop propagation. So step k which before events detection would have been expected to cover time range [tₖ₋₁ ; tₖ] will really cover time range [tₖ₋₁ ; tₑ] where tₑ is the time of event occurrence: truncation changes the range of the step interpolator that handleStep
method will receive as its first argument. If event occurrence stops propagation this will also change the value of the isLast
boolean that handleStep
method will receive as its second argument.
problem statement
The current way we handle this is that eventOccured
is called first, and handleStep
is called afterwards. This implies that with previous notations, user callbacks will experience call with a fixed time set to tₑ first (in the event handler) and with a time range set to [tₖ₋₁ ; tₑ] next (in the step handler). This is counter-intuitive as we seem to go backward in time handling the end of the step before the step itself. This has lead to several problems in the past, mainly when user code combines the output of both functions in some log file, or when it keeps some internal state between calls. Users currently need to add difficult logic in their code to compensate for our current design.
proposed solution
I would like to reverse these calls to get something more intuitive for users and allow more straightforward code.
A snippet of the Hipparchus AbstractIntegrator.acceptStep
method, which has the same structure as Orekit AbstractAnalyticalPropagator.acceptStep
(except Hipparchus manages a list of step handlers whereas Orekit handles only one) is shown below:
eventLoop:
while (!occurringEvents.isEmpty()) {
// handle the chronologically first event
final EventState currentEvent = occurringEvents.poll();
// get state at event time
ODEStateAndDerivative eventState = restricted.getInterpolatedState(currentEvent.getEventTime());
// restrict the interpolator to the first part of the step, up to the event
restricted = restricted.restrictStep(previousState, eventState);
// try to advance all event states to current time
for (final EventState state : eventsStates) {
if (state != currentEvent && state.tryAdvance(eventState, interpolator)) {
// we need to handle another event first
// remove event we just updated to prevent heap corruption
occurringEvents.remove(state);
// add it back to update its position in the heap
occurringEvents.add(state);
// re-queue the event we were processing
occurringEvents.add(currentEvent);
continue eventLoop;
}
}
// all event detectors agree we can advance to the current event time
final EventOccurrence occurrence = currentEvent.doEvent(eventState);
final Action action = occurrence.getAction();
isLastStep = action == Action.STOP;
if (isLastStep) {
// ensure the event is after the root if it is returned STOP
// this lets the user integrate to a STOP event and then restart
// integration from the same time.
eventState = interpolator.getInterpolatedState(occurrence.getStopTime());
restricted = interpolator.restrictStep(previousState, eventState);
}
// handle the first part of the step, up to the event
for (final ODEStepHandler handler : stepHandlers) {
handler.handleStep(restricted, isLastStep);
}
The eventOccurred
method is called when currentEvent.doEvent
is called and the handleStep
method is called at the end of this snippet.
This shows that except for the isLastStep
case, time range truncation is already handled before doEvent
(and therefore eventOccurred
) is called. Moving the call to handleStep
just the line before the call to doEvent
would ensure handleStep
already sees the correct time range and is called before eventOccurred
is called: user callbacks would be back in logical order, according to forward or backward propagation direction.
When an event stops the propagation, though, we miss the last call to restrictStep
. If I understand this snippet correctly, this feature is intended to handle numerical glitches due to convergence threshold in event detection: we need to make sure we are after event time so we ensure that starting a new propagation after this one ends will not detect the stop condition again an in fact the loop of successive propagations being stuck and not being able to go past one event.
From a scheduling point of view, I would think this should be handled with three calls to user callbacks: the first call to handleStep
with truncated step, then the call the eventOccurred
, then a new call to handleStep
with an almost zero step size to go past the event. This call could be put in the if (isLastStep)
section.
Alongside with this proposed change, I wonder if the isLast
second parameter to handleStep
is still relevant there. I now thinkg that rather than passing this boolean in the regular step handling method, it would be wiser to have a separate method finish(finalState)
method that would be the counterpart of the init
method.
consequences for users
The two consequences for users would be the following.
- calls to callback would now be scheduled in way consistent with propagation direction
-
handleStep
would sometimes be called with very small or even 0 step size
The first consequence is the goal of the change. The second consequence is probably not really a problem and it already can occur as we can have several events occurring at the same time so intermediate steps are zero. We may add some comments in javadoc so users are aware that this can happen, in case they use things like division by step size to compute some rates.
Of course, if this change is accepted, it should be done in both Orekit and Hipparchus.
merging propagation modes
context
The second aspect concerns propagation modes.
Since a number of years (I tracked it back to 13 years ago in the Propagator
interface, but it was probably earlier in other classes), we have three propagation modes:
- master mode, where the propagator manages the time loop and user code is called via callbacks in step handlers
- slave mode, where propagator just advances to target time and return to caller, who manages the time loop and usually uses the returned state
- ephemeris generation mode, where propagator advances to target time and stores an ephemeris that use can retrieve and analyse a posteriori
problem statement
There are several problems with this approach.
The first one is philosophicalor social and relies on the connotation of master/slave terminology which is nowadays unsustainable. Many other technology-oriented project have made move to abandon these terms and change to more neutral terms. A number of replacements are proposed in wikipedia Master/slave (technology) page, but I don’t see they are well suited to our usage. I have seen other suggestions like leader/follower or controller/responder that are more to the point of our usage.
The second problem is that these modes are inconsistent. Master and slave mode are exclusive to each other, but ephemeris generation mode is not really a separate mode, and can be used with both previous modes, depending on how it is set up.
The third problem is that sometimes, we would like to have several different step handlers at the same time. This is allowed at low level for Hipparchus ODE integrators, but not at Orekit level. We do have a OrekitStepHandlerMultiplexer
that was added ten years ago, but it does not handle directly fixed step handlers, which are useful for regular logging for example.
The fourth problem is that it is somewhat confusing for new users.
Sometimes we also alreay mix things, for example by calling a propagator several times in row (hence having an outer loop that is slave mode alike) and using master mode for each propagation leg. It’s not really a problem, it is just something I find strange.
proposed solution
When looking at the implementation, there are in fact no major distinction between all these modes. Propagators always run in a single mode, which is the current master mode, and the other modes are simply related to special step handlers. Slave mode is obtained by just setting the step handler to null so no callback is ever triggered. Ephemeris generation mode is obtained by just wrapping the propagator itself for analytical propagators and by registering an additional step handler at Hipparchus level for integration-based paropagators (i.e. DSST and numerical propagator).
So I would suggest to just drop the concept of propagation modes and just let users register or not step handlers or not, as many as they wish, with fixed or variable step sizes.
For ephemeris, we could either let the setEphemerisGenerationMode
convenience method (perhaps renaming it generateEphemerisOnTheFly
or something similar) and the associated getGeneratedEphemeris
method or remove both and set up a dedicated EphemerisStoringStepHandler
. I would personnaly prefer the later.
It also helps debugging code or adding optional output with dedicated step handlers. We can also have both a fine-grained fixed step log file with huge output set to a file and a more coarse-grained fixed step output intended for user display so they see progress for long term computation.
consequences for users
The main consequences would be
-
setSlaveMode
renamed intoclearStepHandlers
-
setMasterMode
renamed intoaddStepHandler
(with both fixed and variable steph signatures preserved) -
setEphemerisMode
renamed intogenerateEphemerisOnTheFly
, with only one no-argument signature, or adding aEphemerisStoringStepHandler
If this change is accepted, it should be done only in Orekit as Hipparchus already supports multiple steph handlers and has no notion of mode and the equivalent of ephemeris is already managed in the public API as a dedicated step handler: DenseOutputModel
I would like to put these changes in 11.0, which should be published soon.