This is most likely due to the caching+interpolation feature used in some frames conversions.
Conversion between inertial and terrestrial frames involve huge computation of nutation effects, with several thousands of Poisson series terms. These effects have periods ranging from about 5 days for the shorter one to 18 years for the longer one if I remember correctly. This means these huge computations can be dramatically reduced by just computing the full terms on a sampling grid and interpolating between grid points. As an example, the transform between GCRF and CIRF computes the full Poisson series only once per hour even if you request the transform every millisecond. The number of points to establish the underlying polynomial model as well as the time step between grid points have been carefully studied and adjusted for each frame transform that uses this feature so the interpolation error is several order of magnitudes lower than the accuracy of the physical models. There are unit tests that check this near the error peak (which is due to lunar effects) ; the test tolerances are that without EOP max angular error is below 4.6e-12 radians and with EOP it is below 1.3e-13 radians, so these interpolation errors are acceptable.
The way the caching works is that the first time you request a transform at date t0, as the grid is empty it will compute several points around t0 to build the first grid (basically between t0 minus 3 hours and t0 + 3 hours) and cache it. Then, as you ask for more points, the same grid is reused as long as your points remain centered, and new points are added to the grid when your propagation requires it, globally just computing one new grid point for each hour of propagation. Older points are preserved in case you need them again (the grid for CIRF keeps up to one month worth of sampling points).
One consequence of this implementation is that if your first call is t0 you will get one grid, but if your first call is t0 + k hours + Δt, you will get a different grid, shifted by Δt and therefore very slightly different results, theoretically up to the interpolation error (which as explained previously is of the order of magnitude of 4.6e-12 radians). This is probably what you observe. Depending on the first test that is run, you get one grid or the other, and the results are off by a tiny bit.
In Orekit tests, we ensure test results are independent of test run order by clearing the cache at the start of each test. This is done using an internal hack that uses Java reflection. You can look at the Utils.clearFactories()
method in the tests source directory (in Orekit tests, we really call Utils.setDataRoot()
that itself calls Utils.clearFactories()
).
We have thought about another way to avoid this by enforcing grid points to be exactly on exact hours (for grids sampled hourly) in TAI time scale for example, instead of using the first t0 as the first grid point, but never really implemented it. This may be worth adding as this problem already hit several people. You can add a feature request on our issue tracker so we implement this.
Anyway, as you noticed the error is very small, it is related to interpolation error and several order of magnitude below precession/nutation models accuracy by design. So it is physically acceptable, but induces non-regression problems if tests do not ensure they start with an empty cache.