Reduced Chi-Square metric computation possible bug

Hello,

When performing some batch least-squares estimation and checking within withLeastSquaresOptimizer.Optimum of Orekit, we have noticed that the reduced chi-square metric returned when using the getReducedChiSquare(n) method (inherited from org.hipparchus.optim.nonlinear.vector.leastsquares.LeastSquaresProblem.Evaluation), was giving us different results as compared to our own analysis of the residuals. After some checks, it appears that the getReducedChiSquare(n) is using the total number of measurements for the computation of the RCS, whereas only the “used” measurements should be considered for such metric.

Could you confirm if this is the case?

Thank you very much for your time

Upping this so that hopefully somebody takes a look.

Do you mean AbstractEvaluation at line 99 should use a reduced number of observations?

What do you exactly mean by “used measurements”?

Hi Luc,

Yes, I mean that the evaluation at line 99 that you mention should have a different number of observations. What I mean by “used measurements” is that during the batch least squared process, observations are discarded during the iterations due to many possible reasons. Thus, the getReducedChiSquare method should return its value with the currently accepted observations, not considering the rejected observations.

According to our tests, the value that is being returned is considering all observations, both accepted and discarded, in the denominator. The reason could be that the variable observationSize is generated at the beginning of the batch least squares process (when no observation has been rejected), and that later it is not updated as the iterations go.

Thanks for taking a look a it.

1 Like

Hi,

looking a bit closer, I think the fix should be in Orekit. This is where the notion of REJECTED measurements is introduced. So BatchLSEstimator could have its own implementation of LeastSquaresProblem.Evaluation that filters out the discarded data. Does that make sense?

Cheers,
Romain.

I think you are right, but am not sure the original post refers to Orekit or Hipparchus.

Hi to both,

First, thank you very much for taking the time to look at this post. Second, It’s true that I opened the post originally for Hipparchus because I saw that was where the low-level function computing the reduced-chi-squared could be found. But it’s also true that perhaps it makes more sense that fix is made on the upper computation level of Orekit rather than Hipparchus. We are using Orekit, which is in turn calling Hipparchus.

Hi,

I believe this would solve the issue:

What do you think?

Cheers,
Romain.

1 Like

Hi Romain,

Thanks a lot for the help. To be honest, I’m not a Java expert (we are working with the Orekit Python wrapper), so I cannot fully confirm if you merge request solves the issue. I don’t really see directly where the new class is taking into account the observationSize, but probably it’s because you’re defining that variable with super(), making it available to the methods inside optimumIn (could that be?). In any case, the tests seem to prove the fix it’s working, so I agree with the MR.

Again, thanks a lot for your time

Regards,

Alejandro Cano

Yes I’m using an Hipparchus abstract class that already takes the total number of measurements as input, which here accounts for the rejected ones.
What version of the Python wrapper are you using? If it’s the jpype one, you could replace the .jar as explained here. I could maybe provide you with it if you can’t compile Java on your side.

Cheers,
Romain.

Hi Romain,

The observationSize should be the number of “scalar” observations. Thus, each accepted measurement should count n times where n is its dimension.

For example, in the test you added, the observationSize should be (10 PV meas) * (6 scalar meas per PV) = 60. Cfr. the observationSize of the Hipparchus optimum, which is 66 and not 11.

Cheers,

Alberto

Hi Alberto,

Fair enough, I’ll update the count accordingly.

Btw, I think we’ve got a case for a patch release in the evry near future.

Cheers,
Romain.

Hi Romain, thanks for the clarifications. We are not using the jpype Python wrapper yet, but we are planning to move to it soon. In any case, no need to rush to have a hot-fix of the bug for us, we already have a workaround in place for now and we prefer to wait for the official fix and jpype.

Thank you very much,

Best regards,

Alejandro Cano

Hi Alberto,

You are completely right, the ObservationSize should count every “measurement”, not only every “observation" (which can be composed of multiple measurements, such as the PV coordinates). Thanks for noticing it in Romain’s bug fix.

Best regards,

Alejandro Cano