Unscented Kalman Filter: improved innovation calculation

dgondelach · January 20, 2025, 3:47pm

\newcommand{\bm}{\boldsymbol}

Hi everyone,

Note: The original post was edited to make clear the “additive” form of the Unscented Kalman Filter is considered here.

This post is about the “additive” form of the Unscented Kalman Filter, where the process noise is Gaussian and provided as a covariance matrix, see Table 7.3 in Wan and Merwe [1].
(For completeness, the “augmented” (non-additive) form, which is not discussed here, can be found in Table 7.2 in Wan and Merwe [1].)

I was looking at the Hipparchus implementation of the Unscented Kalman Filter and came across a potential limitation in how the innovation is computed.

The notation here is based on Table 7.3 in Wan and Merwe [1], where the “additive” form of the UKF is formulated. Below, I abbreviate the sampling of sigma points as

UT : \hspace{1cm} \bm{\mathcal{X}} = \left[\begin{matrix} \bm x & \bm x + \gamma \sqrt{\bm P} & \bm x - \gamma \sqrt{\bm P} \end{matrix}\right].

What is currently done in the method UnscentedKalmanFilter::estimationStep is the following:

\begin{align} \bm{\mathcal{X}}_{k-1} &\overset{UT}{\longleftarrow} {\bm x_{k-1}}, \bm{P}_{k-1} & \text{sample sigma points}\\ \bm{\mathcal{X}}_{k|k-1}^*, \bm{Q}_{k|k-1} &\longleftarrow \bm F(\bm{\mathcal{X}}_{k-1}) & \text{propagate (Orekit)}\\ \bm x_{k|k-1}, \bm P_{k|k-1}^* &\overset{UT^{-1}}{\longleftarrow} \bm{\mathcal X}_{k|k-1}^* & \text{collapse sigma points}\\ \bm P_{k|k-1} &= \bm P_{k|k-1}^* + \bm Q_{k|k-1} & \text{add process noise}\\ \bm{\mathcal X}_{k|k-1} &\overset{UT}{\longleftarrow} \bm x_{k|k-1}, \bm P_{k|k-1} & \text{resample sigma points (!)}\\ \bm{\mathcal Y}_{k|k-1} &\longleftarrow \bm H(\bm{\mathcal X}_{k|k-1}) &\text{predict measurements (Orekit)}\\ & \text{Compute } \bm{\mathcal K}_k\\ & \text{Compute } \bm{x}_k, \bm P_{k}\\ \end{align}

In order to incorporate the process noise, the non-Gaussian predicted sigma points \bm{\mathcal X}_{k|k-1}^* are used to compute the predicted mean and covariance \bm x_{k|k-1}, \bm P_{k|k-1}^* (after which the sigma points are discarded). Then the process noise \bm Q_{k|k-1} is added and new sigma points \bm{\mathcal X}_{k|k-1} are sampled from the mean and (process noise including) covariance.
This approach is valid, however, it “discards any odd-moments information captured by the original propagated sigma points.” [1, p.233, footnote 6]. This means that odd-moments information is lost for computing the innovation and its covariance. Hence we lose part of the benefit of using an unscented transform.

To maintain the odd-moments information, rather than discarding the original sigma points they can be kept and additional sigma points can be added to reflect the addition of process noise, as shown in [1, Eq. (7.56)]:

\bm{\mathcal X}_{k|k-1} = \left[\begin{matrix} \bm{\mathcal X}_{k|k-1}^* & \bm{\mathcal X}_{0, k|k-1}^* + \gamma \sqrt{\bm Q_{k|k-1}} & \bm{\mathcal X}_{0, k|k-1}^* - \gamma \sqrt{\bm Q_{k|k-1}} \end{matrix}\right],

where \bm{\mathcal X}_{k|k-1}^* are the original predicted sigma points, \bm{\mathcal X}_{0, k|k-1}^* is the first (central) sigma point and \bm Q_{k|k-1} is the process noise.

Note that we now have 4L + 1 sigma points instead of 2L+1 (L is the dimension of x). These are used compute the innovation, and innovation and cross covariance.

I have not tried out this addition of sigma point myself, but I’d think that preserving the information in the originally predicted sigma points to compute the innovation has a beneficial effect on the performance of the UKF, specifically in case of long gaps between measurements.

I would love to hear your thoughts on this.

Best,
David

…

Reference:

[1] Wan, E., & van der Merwe, R. (2001). 7 the Unscented Kalman Filter. https://api.semanticscholar.org/CorpusID:14862265

Notation:

\bm x: State vector
\bm y: Measurement vector
\bm{\mathcal{X}}: State sigma-points
\bm{\mathcal{Y}}: Measurement sigma-points
\bm{P}: State covariance
\bm Q_{k|k-1}: Process noise between t_{k-1} and t_k
\sqrt{\bm A} : the lower-triangular Cholesky factorisation of matrix \bm A
\bm F : function to propagate sigma points and process noise (UnscentedProcess::getEvolution(...))
\bm H: function to map sigma points from state space to measurement space (UnscentedProcess::getPredictedMeasurements(...))
\gamma: Unscented Transform multiplication factor (AbstractUnscentedTransform::getMultiplicationFactor)
(\cdot)_k: Corrected at t_k
(\cdot)_{k|k-1}: Predicted at time t_k based on state estimate from t_{k-1}
(\cdot)^*: Before considering process noise

markrutten · January 21, 2025, 12:31am

The current implementation is what is usually called the “additive” form, which is appropriate when your process model contains additive process noise, i.e.

x_{k+1} = f(x_k) + u_k,

where u_k is Gaussian process noise. The augmented (non-additive) form that you mention above is for the more general non-linear case

x_{k+1} = f(x_k, u_k).

In general, you can also include the measurement noise in your augmented state (Julier and Uhlmann, 2004), so that you have 2(2L + M) + 1 sigma points, where M is the dimension of the measurement.

There’s definitely no harm in having a non-additive version of the UKF in Hipparchus! I’ll need to think about it a bit more, but it won’t just be a case of having an “additive” flag when instantiating the filter … there are differences in the interface to the process and measurement models, so we would (might?) need a non-additive version of UnscentedProcess.

I’m not sure about the benefits, especially for orbit determination. As you say, the advantage would be where the covariances (of the state and/or the measurement noise) are large, in similar cases to where the UKF provides an improvement over the EKF. There is a paper by Wu et. al., 2005 that shows an improvement, but their example is a bimodal econometrics model, so it’s unclear to me whether the majority of the benefit was from better estimation accuracy or just better resolving of the mode.

There are a bunch of different options for improving the performance of the estimators, but I don’t know which ones would provide the best bang-for-buck for orbit determination. I think there’d be some benefit in having the ability to define process noise in continuous time, e.g. Sarkka’s continuous-discrete UKF. So we could specify process noise just for the drag coefficient (for example) and have that integrated along with the mean/covariance, which would accumulate the uncertainty in drag into the orbital parameters in a pricipled way. But that would be a serious implementation challenge. And I also like the idea of implementing an iterated version of the EKF/UKF, like the iterated posterior linearisation filter, which would, again, improve performance with large uncertainties.

dgondelach · January 21, 2025, 8:16am

Hi @markrutten,

Thank you for the extended reply!

I was actually referring to the “additive” form of the UKF. My apologies for the confusion. I should not have used the term “augmented”. I have updated the original post to make clear that the post is about the “additive” form of the UKF, so new readers are hopefully less confused.

The difference in the Orekit implementation and the approach I mentioned is only in adding more sigma points to include the process noise before the update step, see Eq. (7.56) in Wan and Merwe.

The main change in Hipparchus would be how the predicted measurement, and innovation and cross covariance are computed. The prediction step would remain the same. The process noise would still be provided as a covariance matrix.

I’ll have a look at the references you shared!

Best,
David

markrutten · January 21, 2025, 10:18am

Thanks! I must have looked that that Wan and van der Merwe chapter 100s of times and have never noticed that augment trick in the additive algorithm (Table 7.3)! Probably because it’s really unusual and non-standard?

Do you think it’s OK? They use a different L in 7.54/7.55 vs the update equations which would mean different weights used in the predict vs update steps? I suppose it would be fine as the predicted mean and covariance aren’t used in the gain or innovations calculations, but they would be slightly different to using the augmented form directly.

Maybe we could construct a problem to demonstrate the benefit? I might have a play with the econometrics example from the Wu (2005) paper - which has additive noise. Then construct something similar for an orbit determination problem?

dgondelach · January 22, 2025, 10:00pm

Hehe, yes it also took me many reads to see it. My feeling is that it is indeed an unusual (or maybe little known?) approach. I’ve not seen it in other literature.

I think it would be good to first find another work in which this approach is used to see how it is applied. I’m especially unsure about the weights. In the footnote they write “This requires setting L->2L and recalculating the various weights Wi accordingly” but they do not say how. I could imagine the new weights are chosen based on the sizes of the process noise and the predicted covariance to give more weight to the one that’s larger. For example, if the process noise is zero, then the new sigma points related to the process noise could basically be ignored.

The econometrics example from the Wu (2005) paper would be a good first test case indeed. An orbit determination problem that benefits from odd moments information we can definitely also construct.

andfio · January 27, 2025, 9:13am

Hello,
I remember I had done some experiments with the continuous time UKF from Sarkka’s paper a few years back. Definitely interesting in that the positive definiteness of the covariance matrices is guaranteed. That would be a nice addition to Hipparchus as well.
Have a great day
Andrea