Questions about output formats

luc · March 26, 2021, 2:40pm

Hi all,

The work for implementing CCSDS ODM V3 is now in the phase of adding the writers for all CCSDS messages we want to support in Orekit.

While doing this work, I stumbled upon the problem of choosing an output format for floating numbers. On the previous versions, either a default format was used (sometimes with extended precision for dates) or a format specified by users. From my experience, letting users choose a format often ends up with a number of digits that is in fact far too small in many cases. For reasons that may be related to file size or to consistency with models accuracy, we often see orbits written at about meter precision, and quaternions with four or six digits. This generates problems later on in three cases:

when the ephemeride is used to compute derivatives (finite differences or polynomial fitting)
when the ephemeride is used to reproduce some specific context
when the ephemeride has to be reused long after it was generated for more accurate needs that were not foreseen at generation

My own point of view is that output should be at a level of accuracy that is intentionally several order of magnitudes smaller than models accuracy, so writing to files does not add additional noise but instead is negligible with respect to computation noise itself. So I personnally prefer to write positions at sub-millimeter accuracy (indeed POD is already at centimeter level) and attitude with about 9 digits.

The rationale of file size is becoming more and more moot with current machines, disks and network. We also seldom have to read ephemerides in an editor and ask for neatly aligned columns.

While playing with these ideas, I found a very good paper about the Ryū algorithm by Ulf Adams (see Ryū: fast float-to-string conversion). This algorithm implements a conversion from double to string that maintains three very important properties out of four identified by Guy L. Steel and Jon L. White 30 years ago:

Information preservation: a correct parser must return the original floating point value from the output
Minimum-length output: the output string must be as short as possible
Correct rounding: the output string must be as close to the input as possible.

Ulf Adams dropped the fourth property which was left-to-right generation.

The Ryū algorithm is extremely fast (about 10 times faster than Double.toString()) and works very well. In fact it sometimes generates output marginally shorter than Double.toString() and according to the author this mainly proves that Double.toString() has bugs… As the reference implementation of Ryū was available under the Apache licence, I included it yesterday in Hipparchus, so one can just use RyuDouble.doubleToString(). I also added a minor improvement to avoid using scientific notation in some cases (which as a consequence enlarging the output size for small or large values) and made a pull request to the original implementation (see Allow customizable switch boundaries for scientific notation).

I will use Ryū in the (currently ugly) AccurateFormatter I have added a few days ago.

I would like to generalize its use and in fact remove the user-customizable formats that are available in some file generating classes. As I wrote above, users have a tendency to use too few digits, and as Ryū maintains roundtrip safety, we could be sure that parsing a file generated by Orekit would always recover the same accuracy that was used to compute it, regardless of configuration. Supporting both roundtrip safety and customize format would be complex and involve providing two write statements everywhere, one with a format, and one without a format (because roundtrip safety is not a consequence of a specific format that is accurate, it results from internal decision in the conversion algorithm).

What do you think about dropping these user formats and relying on Ryū for files generation?

luc · March 26, 2021, 2:54pm

Another point is that Ryū is really much faster than raw Double.toString() so I think (but did not
really check) that it should also be much faster than using a format.

evan.ward · March 26, 2021, 7:46pm

Hi Luc,

+1 for using full precision in the output.

Interesting find with the Ryu algorithm. Seems like it trades using more memory for faster performance. That is the same trade made by FastMath so I agree it is a good addition for Hipparchus.

Have you tested the Ryu algorithm? That is, for all doubles d, including the weird ones:

d == Double.parseDouble(Ryu.toString(d));

Regarding date formatting I think the need for a separate class points to DateTimeComponents.toString() and its overloads being a bit lacking. I tried to address some of those issues with the toStringRfc3339() method which outputs up to 14 digits to avoid rounding using a DecimalFormat. If Ryu is a faster way to do that then we should use it there as well. And perhaps update the DateTimeComponents.toString() method with the same behavior for 11.0.

luc · March 27, 2021, 7:36pm

The original author did, and I did as well. You can look at the tests in Hiparchus. In the randomized test, when we compare to Double.toString, we check that Ryū finds either the same length or a shorter length and that both implementation give the same number back when parsing. I did not check what the difference was but in the video presentation, I think Ulf Adams said there were bugs in Double.toString, probably both for the length and for the rounding.

In branch issue-474, the new org.orekit.utils.AccurateFormatter uses Ryū for both double and dates. I agree we should move this to one of the date classes.