BLSQ not giving expected results

We’re working on implementing BLSQ and are running into a serious issue.
Right now we’re only supporting PV observations for BLSQ.

Let me describe our test setup:
We’re using the NumericalPropagator to generate an ephemeris (using an fixed step handler to extract the intermediate states).

Then we’re feeding these states as PV observations into BLSQ with the exact same integrator, integrator settings, force models, etc… using the same initial conditions (including mass, area, cd and cr).

The expectation is that since we’re using the exact same settings and giving BLSQ all the propagated states as observations, that it would converge immediately and have near 0 residuals.

However this is not the case:

  • Test 1: No force models - Estimated Position is off by 1e-7, velocity if off by 9e-11
  • Test 2: Holmes FeatherStone gravity model - Estimated Position is off by 4e-3, velocity if off by 8e-8
  • Test 3: NRLMSISE00 atmos + drag model - Estimated Position is off by 6e-5, velocity if off by 8e-8
  • Test 4: NRLMSISE00 atmos + drag model + cd estimation enabled: Estimated Position is off by 1e-5, velocity if off by 4e-8, cd is off by 6e-7
  • Test 5: Solar radiation pressure model - no convergence after 15 iterations
  • Test 6: Solar radiation pressure model + cr estimation enabled - no convergence after 15 iterations

For the numeric integrator, we’re using the DormandPrince853Integrator (and DormandPrince853IntegratorBuilder for BLSQ) with min step of 0.001, max step of 1000.0 and position tolerance of 1e-9. For BLSQ we are using the exact same settings as for propagation, with a convergence threshold of 1e-3. The PV observations are created with a sigma of 1000 for position and 100 for velocity, though changing those values did not have an effect. We’re using a scaling factor of 10, though changing that to 1 or 1000 did not have any effect.

We do not know what is wrong. Sorry for not posting code snippets, but those would be quite large.

We also did another different test on the propagator itself. We propagate a statevector forwards for a 1 day, and then propagate the result backwards 1 day to see if we land on the same initial SV, however doing this the position is off by nearly a centimeter. Ideally it should be nearly identical if you do this.

One thing you could try is to see if reducing maxStep in propagation (both when generating PV measurements and in BLSQ) and using a variable step handler to extract PV at exactly the computed step end.
What I have in mind is that the propagator step interpolation may introduce errors. The Dormand-Prince 8 (5, 3), just as all other embedded Runge-Kutta models, performs its interpolation using the internal model given by Butcher arrays. The coefficients of the Butcher arrays are optimized so high order error terms cancel out at step end, but they do not cancel out in the middle of the step. So the accuracy of the model is bumpy, with low points at step ends and higher points in the middle. So with a step tolerance of 1.0e-9 on position and a step size that may reach 1000 seconds, having a few 1.0e-5 position error in mid step is possible.

Reducing the max step and using a variable step handler did not improve things (actually made it worse and a lot slower). What I don’t understand is why there are any residuals. If I run the same propagation twice, with the same settings I am getting the exact same results. The BLSQ algorithm is also just running the same propagation (though the start and end date are shifted by 1s from what I can tell in the code), so I would expect no residuals on the first iteration. If I try it with only using two observation (the start and end of the propagation), so there will not be any errors introduced due to the fixed step handler, I am still getting similar results.

Are our expectations for BLSQ too high?

The 1s is bothering me. Can you track down where it comes from?

Sure:

org.orekit.estimation.leastsquares.AbstractBatchLSModel#value

    if (isForwardPropagation()) {
            // Propagate forward from firstDate
            parallelizer.propagate(firstDate.shiftedBy(-1.0), lastDate.shiftedBy(+1.0));
        } else {
            // Propagate backward from lastDate
            parallelizer.propagate(lastDate.shiftedBy(+1.0), firstDate.shiftedBy(-1.0));
        }

From what I can see, firstDate is the date of the chronologically first observation, last date is the date of the chronologically last observation

And what happens if you remove this shift (or collapse it to say 1.0e-9s or something if 0 does not work)?

Or even better replace the first date by your own propagation date

Took me a while, I had to deploy a version of Orekit locally with those offsets removed. I did however try it with only two observation and this 1s offset removed, with reduced the position and velocity error to 1e-12 when using no force models. With force models enabled, no real difference.

I am beginning to suspect that we are expecting too much precision out of this process.

Hello there,

Just double checking: you are using the same OrbitType (and if applicable PositionAngle) in your propagation and orbit determination?

Also, are you using the tolerances function in NumericalPropagator? If yes, bear in mind that the output will be slightly different when called with the initial or final orbit.

Cheers,
Romain

Both propagation and BLSQ are using Cartesian Orbit and all in the same reference frame (GRCF). Tolerances is used with a dP of 1e-9

I agree 1.0e-5m is asking for very, very high accuracy, well beyond realism of physical models, but this is the sort of things we look after in Orekit as the computation accuracy should go beyond models. One of our driving goals is that the implementation details (threshold, algorithm, step size…) should induce errors that are several orders of magnitude below physical models, so at the end the limiting factor is the physical model. Here, we are in this ballpark: 1.0e-5m is most probably acceptable as nobody expects satellite position to be accurate at 0.01mm level. But I would like to understand what creates this 0.01mm difference.

For a realistic scenario, I also do not expect this kind of precision, our input data won’t even be close to this precise. But this was a validation test to make sure we implemented things correctly and I kinda expect to get the exact same result (aside from rounding errors) when I am feeding in the exact correct settings for the propagation. The the values I’m seeing are way beyond rounding errors

Yes, and this is something that bothers me.