Revamping dates handling

luc · August 23, 2024, 3:57pm

I have been working with @LordRaptor on dates handling in Orekit for the last few weeks. This work started about issue 1454 ( AbsoluteDate.J2000_EPOCH is offset from the actual J2000 by a rounding error). Another long-standing issue related to date was issue 707 ( Unable to parse 1960-12-31T23:59:61.4). This is a big change as it implied a complete rewrite of a core class. We are therefore asking the community to review the change before we commit it to the develop-13 branch (or the develop branch if we decide to stop work on 12.X series). The change is in the branch sub-atto-seconds-dates. The branch name is a misnomer, the new dates are not sub attoseconds, there are exactly at attoseconds resolution. The reason for this misleading name is that I revived an older branch for this, and at the time I was targeting even finer resolution; I changed my mind since then.

Below is a direct copy of the text that will be present in the 12 to 13 upgrades instructions if the change is validated. You can find the current status of these upgrade instructions here.

As I am sure there will be a question about this, this change has a minor effect on performances (a few percents as far as I can tell). This will of course depend on the application. All applications do handle dates, but there are many parts that are much more costly (typically everything related to Earth gravity field or surface forces with box and wing models for example). On my personal computer running the 6980 tests of the develop-13 branch takes 413 seconds, while running the 7042 tests of the sub-atto-seconds-dates branch takes 422 seconds. Running the tests under IntelliJ IDEA profiling tools, the dates handling is far lower than other algorithms and is not even visible in the flame graph (I had to dig into the methods lists to look for AbsoluteDate and SplitTime methods). So I would say the change is not too costly, and it brings better accuracy, better robustness, the dates are more human-friendly (for those who are fed up with outputs with 15 decimals in UTC) and the code is much simpler to understand and maintain.

So what does the community think about this change?

[below is the copy of the part of the upgrades instructions related to dates handling]

overview of the change

In the 12.X series, dates were implemented using an epoch as a whole number of seconds from
a reference date and a double offset corresponding to the fractional parts of the seconds.
As the fractional part was between 0.0 and 1.0, its resolution was of the order of magnitude
of a femtosecond near 1.0, and when dates resulted from successive computations (typically
sequences of calls to shiftedBy), accuracies at picoseconds level could be regularly
achieved. There were however rounding problems when either dates were output in textual
form with decimal digits, either because writing for safe roundtrip operations induced writing
many spurious digits, or because writing with reasonable digits numbers (say down to millisecond
or microseconds) prevented safe roundtrip operations. Another type of problems occurred when
standard java Instant, Date or TimeUnit were used as they refer to milliseconds,
microseconds or nanoseconds. Yet another problem was when TT (Terrestrial Time) scale was used, as its offset with respect to TAI (International Atomic Time) is 32.184s exactly and 0.184 cannot be represented exactly as an IEEE754 primitive double (the closest normal number that can be represented exactly is \frac{3314649325744685}{2^{54}}, which is about 3.11 attoseconds smaller).

There were also several problems with the linear models between UTC and TAI that were used
before 1972, as the offset was a whole number of microseconds and the slope were a whole
number of nanoseconds per seconds. Supporting properly the linear models before 1972 may
seem moot but is in fact really required because many systems use Unix time so it is
widely used in interfaces or databases. The official Unix time as defined by POSIX does
not consider leap seconds, but many users ignore it and will just use
new ApsoluteDate(1970, 1, 1, utc) and expect this to be seamlessly interoperable with
standard java Instant, Date or TimeUnit. It is not fully possible but at least
roundtrip conversions between the two representations, with and without leap seconds,
should remain safe. Unfortunately the 1970-01-01 epoch is located in a four years time
range (from 1968 to 1972) during which the offset between UTC and TAI exhibited a 30 ns/s slope. This induced a 378691200 ns offset as of 1970-01-01 (to be added to the 4213170 µs
offset that was active on 1968-01-01) and the slope continued to be applied for two years
later. This complicates a safe roundtrip conversion.

In order to alleviate these problems, dates handling has been thoroughly revamped. The whole
number of seconds is still stored as signed primitive long like it was before, so the range of
dates that can be represented is still ±292 billion years for AbsoluteDate (but it is still
±5.88 millions years for DateComponents and DateTimeComponents as they use primitive
int for the day offset with respect to J2000.0). The fractional part within the second is what
was changed: it is now also stored as non-negative primitive long with fixed precision at a
resolution of one attosecond (10^{-18}s). The choice of attoseconds allows to represent
exactly all important offsets (between TT and TAI, or between UTC and TAI during the linear
eras), as well as all times converted from standard java Instant, Date or TimeUnit classes.
It also allows simple computation as adding or subtracting two values in attoseconds that are
less than one second does not overflow (a primitive long could hold any values between ±9.22s
in attoseconds so simple additions and subtractions followed by handling a carry to bring the
value back between 0 and 10^{18} is straightforward). The workhorse of the new implementation is the SplitTime class that contains a time split into seconds and attoseconds. This new class is therefore much more accurate than the previous one (attoseconds rather than femtoseconds) and more importantly more robust, much simpler as it does not have to deal with IEEE-754 and decimal-friendly. Provisions have been made to still handle NaN, and ±∞ (both in computation, parsing and writing).

Many methods that used primitive double to represent durations or offsets have been rewritten
to take SplitTime instances as arguments or to generate them as return values. This affects
the API of classes AbsoluteDate, TimeComponents, DateTimeComponents, GNSSDate, OffsetModel, UTCTAIOffset and their field counterparts for the classes that have one, as well as the TimeScale and TimeShiftable interfaces and all their implementations. In most cases, the methods taking a primitive double as an argument have been kept, and they delegate to a new method that creates a SplitTime instance on the fly from the double. Methods that returned a primitive double have sometimes been kept (for example durationFrom in AbsoluteDate is still there) but a sister method has been created to take advantage of the new implementation with increased accuracy (so there is now a splitDurationFrom method in AbsoluteDate). Some methods that returned a primitive double have been changed and now return SplitTime instances, this is in particular the case of all the methods in the TimeScale interface.

As these changes were made, it appeared the AggregatedPVCoordinatesProvider class threw
OrekitIllegalArgumentException and IllegalStateException instead of OrekitException when an out-of-range date was used, which make them more difficult to catch. This has been changed too.

how to adapt existing source code

Despite the change is a revamp of the most widely used class in Orekit (AbsoluteDate), many efforts have been put to preserve the public API as much as possible. Many methods using primitive double or providing primitive double are still there. One big exception to this is the TimeScale interface, which now only uses SplitTime instances. As most users just use the time scales as opaque objects when reading/writing dates, they should not be affected too much. In any case, they can still continue using primitive double by wrapping them in or out of SplitTime instances, replacing calls like timeScale.offsetFromTAI(date) by timeScale.offsetFromTAI(date).toDouble(). On the other hand, if they need to call a method that needs a SplitTime and they only have a primitive double, wrapping is done by replacing calls like object.someMethod(offset) by object.someMethod(new SplitTime(offset)).

Custom implementations of TimeShiftable could be updated to take advantage of the new shiftedBy(SplitTime) method, but it is not required as there is a default implementation that delegates to the original shiftedBy(double) method.

If some user code breaks due to API changes, though, it is recommended to avoid using the
wrapping between primitive double and SplitTime. What is recommended is to take the
opportunity to remove entirely the primitive double and generalize use of SplitTime
everywhere. This will increase both accuracy of computation and robustness.

Avoiding primitive double also applies to parsing and to literal constants in models or non-regression test input data. When parsing a text like “3.2184e+01” from a String field variable, instead of using new SplitTime(Double.parseDouble(field)) one should rather use SplitTime.parse(field). The rationale is that SplitTime.parse preserves decimals because it will parse the “3” and “2184” parts separately, apply the exponent and split that into an exact 32 seconds part and an exact 184 milliseconds part, whereas the double parsing will be slightly off as IEEE754 cannot represent this number exactly. The small parsing error will show up when printing the value back to text. When using literal constants in source code (say for example 32.184 as before, which is the offset between TT and TAI) then rather than using new SplitTime(32.184), users should use the linear combinations constructors as in new SplitTime(32, SplitTime.SECOND, 184, SplitTime.MILLISECOND). There are such linear combinations constructors from 1 to 5 terms and the multiplicative factors can be long integers.

The static method TimeComponents.fromSecond intended to finely tune construction of dates within a leap second occurrence has been replaced by a public constructor taking a single SplitTime argument instead of its first two arguments.

In the OffsetModel class, the units for slopes in UTC-TAI linear models used prior to 1972 were changed from seconds per day to nanoseconds per UTC second (despite neither is a SI unit), as looking at the values shows these slopes were in fact simple numbers (only different three slopes were used between 1961 and 1972: 15ns/s, 13ns/s and 30ns/s). The offset has also been changed from double to SplitTime to allow representing exactly the microseconds offsets used in linear models before 1972. As the UTCTAIOffset and OffsetModel classes are mainly intended to implement UTC-TAI loaders and Orekit already provides loaders for the major formats, this should not affect many users. For those users who did implement custom loaders that take old slopes and double offsets into account, they should scale the slopes by changing their parsing code from double slope = Double.parseDouble(field) to int slope = (int) (SplitTime.parse(field).getAttoSeconds() / SLOPE_FACTOR) were SLOPE_FACTOR is defined as long SLOPE_FACTOR = 86400L * 1000000000L; and then build the offset by using this integer slope in nanoseconds per UTC seconds. Using SplitTime.parse instead of Double.parseDouble avoids numerical noise as parsing is done in decimal.

If users caught OrekitIllegalArgumentException and IllegalStateException when using AggregatedPVCoordinatesProvider, they must now catch OrekitException to recover from out-of-range dates.

As dates resolution is now always exactly one attosecond, when using shitftedBy to set up a date just before of after another date (for example to set up a transition in a TimeSpanMap), the recommended shift value is either SplitTime.ATTOSECOND or SplitTime.ATTOSECOND.negate(). Using Double.MIN_VALUE won’t work (in fact it only worked in previous versions when the date was exactly at a TAI second, i.e. when the offset was exactly 0.0).

luc · August 28, 2024, 7:22am

No answers? Really?

This change was a huge amount of work for weeks, including one third of my holidays dedicated to it. 76 commits, 5227 lines added, 2656 lines removed, attempts to preserve upward compatibility as much as possible, hundreds of tests reviewed, debugged and adapted, benchmarks run, analyzed, and optimized, countless hours spent on grunt work, big effort on documenting the changes for easier upgrade. At the end, I think the most used class in Orekit is now simpler, more accurate and much more robust than ever, and its field counterpart is also much simpler. At least this is what I aimed for.

During the development, I had almost no feedback, so I reached out to the community on the forum asking for help, because I felt that a change in such a core class should be reviewed by my peers.

Zilch.

This is depressing

sdinot · August 28, 2024, 12:28pm

You know that I don’t have the scientific expertise to assess the relevance of this monumental work. But I can think of at least two reasons for the lack of response from the other contributors:

It’s summer and a lot of people are on holiday.
Your modification affects the heart of Orekit and, if I’ve understood correctly, it has a major impact on the project, but also on the documentation, tutorials, examples and, above all, the projects that use Orekit. Evaluating such a change, giving a reasoned opinion on it, is certainly not something you can do in a few minutes. It requires thought and therefore a bit of time.

Serrof · August 28, 2024, 3:58pm

Hi Luc,

I think everyone here recognises your tremendous work on the library and probably trust you blindly. Besides end of August is arguably not the best time to request feedback, especially from the French members. Personally I do not feel knowledgeable enough in precise time computations and floating point arithmetic to make any technical comments. If you say you simplified the code, that sounds even better.

Cheers,
Romain.

dstrobel · August 29, 2024, 1:18am

Hi Luc,
Just wanted to say as a community member I’m very excited for Orekit to get dual integer time representation. It seems like you’ve done a great job encapsulating the change in the public API.

It seems like SplitTime will have a large presence in the public API going forward since now it is recommended for using shiftedBy etc. I see at one point in your commit history you renamed it from SplitOffset to SplitTime - I had also thought it might be named TimeOffset or something similar, since it is effectively Orekit’s new time offset class. Would you be able to expand a bit on how you settled on this name? Thanks!

luc · August 29, 2024, 8:00am

Hi @dstrobel, thanks

I like your idea about renaming SplitTime into TimeOffset, I’ll do it. It is both concise (I have a known bad habit to choose too long names) and the name you propose is to the point, I really like it.

I cannot use a similar name change for the splitDurationFrom method in {Field}AbsoluteDate, though, because there is already an offsetFrom method there with different semantics. It is a bad name too, but for compatibility I think we should keep it. I think I will go to something like accurateDurationFrom instead (and also accurateOffsetFrom since I seem to have missed it).

You seem to have reviewed the changes thoroughly, thanks a lot for that.

dgondelach · August 29, 2024, 11:02am

Hi @luc ,

I just read your post. This sounds like a great improvement!
The double part of the AbsoluteDate has also given me headaches in the past

I will have a look at the merge request to see the new TimeOffset (I also like this name) and AbsoluteDate implementations. This is the MR, right?

Best,
David

luc · August 29, 2024, 11:08am

Yes, this is the merge request I am referring to.

Vincent · August 30, 2024, 7:16am

Hello @luc,

I’m clearly not qualified enough to evaluate the accuracy of the changes but i can tell that the code is much clearer and concise which makes it easy to read !

Thank you for this work as not anyone would have been able to do this .

Cheers,
Vincent

MaximeJ · August 30, 2024, 12:39pm

Hello @luc,

It is indeed a vast amount of work! Thank you for that!

I agree with all previous comments and I also prefer the TimeOffset name.
I’ve done a quick review of your code, mostly cosmetic.

I think we’re all happy to have better dates’ precision, no loss of performance, and a simplified code!

Cheers,
Maxime

PeaDubYa · September 3, 2024, 7:59am

It is indeed much appreciated that you tackle tricky problems at the heart of the library, and that you give such a detailed consideration and explanation of how to adapt to the change for those benefitting from the library.
Library runtime performance is important, but you have clearly assessed and addressed this as far as practicable.
Unfortunately I am not in a position to comment on or validate the change.
Thanks for the great work!