Hi all,
The work for implementing CCSDS ODM V3 is now in the phase of adding the writers for all CCSDS messages we want to support in Orekit.
While doing this work, I stumbled upon the problem of choosing an output format for floating numbers. On the previous versions, either a default format was used (sometimes with extended precision for dates) or a format specified by users. From my experience, letting users choose a format often ends up with a number of digits that is in fact far too small in many cases. For reasons that may be related to file size or to consistency with models accuracy, we often see orbits written at about meter precision, and quaternions with four or six digits. This generates problems later on in three cases:
- when the ephemeride is used to compute derivatives (finite differences or polynomial fitting)
- when the ephemeride is used to reproduce some specific context
- when the ephemeride has to be reused long after it was generated for more accurate needs that were not foreseen at generation
My own point of view is that output should be at a level of accuracy that is intentionally several order of magnitudes smaller than models accuracy, so writing to files does not add additional noise but instead is negligible with respect to computation noise itself. So I personnally prefer to write positions at sub-millimeter accuracy (indeed POD is already at centimeter level) and attitude with about 9 digits.
The rationale of file size is becoming more and more moot with current machines, disks and network. We also seldom have to read ephemerides in an editor and ask for neatly aligned columns.
While playing with these ideas, I found a very good paper about the Ryū algorithm by Ulf Adams (see Ryū: fast float-to-string conversion). This algorithm implements a conversion from double to string that maintains three very important properties out of four identified by Guy L. Steel and Jon L. White 30 years ago:
- Information preservation: a correct parser must return the original floating point value from the output
- Minimum-length output: the output string must be as short as possible
- Correct rounding: the output string must be as close to the input as possible.
Ulf Adams dropped the fourth property which was left-to-right generation.
The Ryū algorithm is extremely fast (about 10 times faster than Double.toString()
) and works very well. In fact it sometimes generates output marginally shorter than Double.toString()
and according to the author this mainly proves that Double.toString()
has bugs… As the reference implementation of Ryū was available under the Apache licence, I included it yesterday in Hipparchus, so one can just use RyuDouble.doubleToString()
. I also added a minor improvement to avoid using scientific notation in some cases (which as a consequence enlarging the output size for small or large values) and made a pull request to the original implementation (see Allow customizable switch boundaries for scientific notation).
I will use Ryū in the (currently ugly) AccurateFormatter
I have added a few days ago.
I would like to generalize its use and in fact remove the user-customizable formats that are available in some file generating classes. As I wrote above, users have a tendency to use too few digits, and as Ryū maintains roundtrip safety, we could be sure that parsing a file generated by Orekit would always recover the same accuracy that was used to compute it, regardless of configuration. Supporting both roundtrip safety and customize format would be complex and involve providing two write statements everywhere, one with a format, and one without a format (because roundtrip safety is not a consequence of a specific format that is accurate, it results from internal decision in the conversion algorithm).
What do you think about dropping these user formats and relying on Ryū for files generation?