Automatically generate orekit-data archive

In order to bring Orekit to a DevOps world, I spotted the need to automatically generate the orekit-data archive.

What could be done automatically? What could be the pipeline for such automation?
(I notice a update.sh but there is no guideline for updating the data.

Related topic: version identifier.
In a related project, we identified the need to introspect the version of the data used by computation. But, currently, we do not find a good information to extract from orekit-data.zip.

  • A simple solution: add a timestamp.txt file containing the date of the build/update.
  • A more evolved solution: add a version.properties file with one property per data.
  • An other solution could be to refactor as a JAR (orekit-data.jar) and using standard Jar metadata (MANIFEST.MF & co.)

Any though about this topic?

The orekit-data archive provided as a git repository that can be cloned or downloaded is considered to be a convenience only and people should NOT rely on it to remain available or be updated. It is often used as an initial setup for quick installation, but people MUST manage the data themselves and update them using the data sources the prefer, with a process that is suitable for their use. Some people may want different JPL or IMCCE ephemerides, some people may want to use Bulletin-B or Bulletin-A for EOP data, some people may want a different Earth gravity field or even gravity fields for other planets…

Anyway, despite of this intended use, here are my thoughts about the other points of your post.

Someone with write access just run the script in his workspace and then push the updated files (or new files) to the repository.

Note however that the script currently fails for the tai-utc.dat file because the USNO servers are down and will remain down for several months (see IERS message 384). This is not a problem yet as we know the next leap second cannot happen before December 2020 (a bulletin D was issued recently to assert no leap second will occur in June 2020).

I like this.

This solution has a major drawbacks: it would induce dramatic performance issues. Jar format is the same as zip format, and this is not recommended for holding data, despite Orekit does support it. The reason for that is because zip/jar puts a central directory at the end of the archive and if you want to read a small file (say tai-utc.dat), you have to read the full archive which can be huge due to JPL ephemerides, EOP data and gravity fields. This is explained in the documentation of the data-context branch (not merged yet into develop), see for example https://gitlab.orekit.org/orekit/orekit/blob/data-context/src/site/markdown/data/default-configuration.md, in the default setup section.

I realize that the orekit-date.zip is a snapshot of the Git branch and not a built artifact.
So, a good (and easy) version information is no more than the commit info. As a consumer, I can obtain such information with curl https://gitlab.orekit.org/api/v4/projects/18/repository/branches/master (and read the returned JSON).

So, the only thing we can automate is the update itself. For example, a bot can regularly run the update.sh and commit, in a develop branch or auto-update branch.
Is it safe to do such thing if a data source is not updated? What about possible inconsistencies?

It is safe to update anytime. There are some caveats with the MSAFE files, but the script already handles that and removes spurious error files corresponding to months not already published.
As the script only downloads some specific files (for example the finals.all and finals200A.all data for EOP, but neither Bulletin A nor Bulletin B), no inconsistencies are expected.