Oreczml tests on my fork won't run

baubin · March 6, 2025, 7:14pm

I am working on some additions to OreCZML, but I am having an issue with getting the unit tests to check out in the pipeline. The unit tests require the use of the orekit-data file, which has a link to it in oreczml-core\src\test\resources, but for some reason the pipeline cannot find/use this link when running the unit tests.

To be clear, I have this issue when I update my own master branch directly from the oreczml repo master branch, so there are no changes between the code that is passing unit tests in the master repo and the code that is failing the unit tests in my own repo.

My repo: Brianna Aubin / OreCZML · GitLab

Vincent · March 7, 2025, 9:32am

Hello @baubin,

It seems related to this configuration : Configuring runners | GitLab Docs

Could you open an issue for this please ?

Cheers,
Vincent

MaximeJ · March 7, 2025, 1:37pm

Hello @baubin,

With @sdinot we’ve relaunched your pipeline with what @Vincent suggested, meaning, the variable:
GIT_SUBMODULE_STRATEGY = recursive

This time git successfully loaded orekit-data as a submodule so this seems to fix the problem. We’re still unsure why there’s a different behavior on your fork than on the “official” repo.
That being said, in the new pipeline, there are errors in the test suite that we don’t understand. Do the Junit tests pass when you run them locally in your IDE?

Cheers,
Maxime

baubin · March 7, 2025, 7:59pm

@Vincent @MaximeJ Thanks very much for replying so quickly. It took me a bit to figure out what was going on, but I think I traced the issue and the short answer is “numeric error”. For example, I ran the CollisionTest on my local machine and it failed because of an assertEquals failure on line 149 when it compares the cesium output file created in the test to the template in oreczml-core/target/surefire-reports/CollisionTest.txt. The final line of the first value set in the file, a set of unit quaternions, is:

Template file:
18000,-0.10744865121119217,0.13642583359778332,-0.2959017257422585,0.9393002437893855

My test output
18000,-0.10744865108967541,0.13642583382228363,-0.29590172652316,0.9393002435246768

I spot-checked the numeric values coming out of my local machine vs. the ones in the gitlab runner, and the ones I checked matched each other. The only thing I can think of that would cause my local machine values to match the gitlab runner values to numeric precision, but not the results generated by @Zudo for the unit tests, is that some minor change in the orekit-data folder data has occurred since Zudo generated the test files which has caused a tiny shift in the test outputs. We could test that by having @Zudo ran his unit tests again after running the update.sh executable. Otherwise I’m not sure what else could cause it.

baubin · March 7, 2025, 9:59pm

Never mind the orekit-data folder theory - I tried downloading what ought to have been the exact version of those files that would have been used by Zudo’s test two months ago, and that didn’t work. My results matched the ones posted above. The weird thing is, I could have sworn that when I was running these unit tests a while back (3-4 weeks) they did work, except now I can’t prove it and I have no idea what the difference is.

Zudo · March 10, 2025, 8:55am

Hello @baubin , sorry for the late answer, so you did use the Orekit Data that I used as a reference for OreCzml , and you still have divergences in the last digits of the computations in some test ?

I know that the collision class need a hard rework, but still, is this still the case ?

Have a good day !

baubin · March 10, 2025, 10:57am

“so you did use the Orekit Data that I used as a reference for OreCzml , and you still have divergences in the last digits of the computations in some test”

To the best of my knowledge, yes. Also, the tests I run have the same output to the last digit on my local machine for both versions of the orekit-data folder and also compared to the values that come out of the gitlab test runner (which is very obviously not using the version of orekit-data that is on my machine). Also, I’m not sure it’s an indication of anything actually wrong with the CollisionTest file, because the values are identical to the 8th decimal place. Also, some of the GlobalTests unit tests have the same issue.

That being said, even if the issue turns out to be something not related to the orekit-data folder, changes in those data files could change the results of the unit tests in the future, so it would probably be smart to find some way to verify the unit tests that allows for numeric precision error in the calculations. Also, I’d be curious to see if your own unit tests still passed now that the orekit-data repo has had updates since the last time you pushed to master. In theory what I have on my branch is exactly equal to what you have on yours, so it would be a valuable data point.

Zudo · March 11, 2025, 10:02am

I knew the UT would be wrong because of the orekit-data changing. That is why the git repository of OreCzml is supposed to be based on a fork of the orekit-data that is not updated. (Because we don’t need such precision in a 3D view).

If still the problem persist, maybe we can add regex to only take the first digits as a reference while the last digits can be anything.

I will look into this when I can

Cheers

baubin · March 11, 2025, 4:58pm

Agreed that regardless of how we resolve this particular issue, we should have some sort of regex test on the UT data instead of a comparison of the full value. Have you tried rerunning the job on your own master copy to see if all the unit tests still pass there?

Zudo · March 12, 2025, 3:20pm

Not yet, I was moved to another project inside my company, I will soon look to see if I can have days to work again on OreCzml because for now I will need to work on it on the weekends

MaximeJ · March 12, 2025, 4:41pm

Hi @baubin, @Zudo,

I’ve tried running the pipeline on master and it does not pass…
I’ve fixed the path of orekit-data submodule in .gitattributes (on master) so that it is now linking to your “frozen” fork @zudo.
I’ve also fixed the CI so that it uses a recursive strategy for fetching git submodules (I don’t know what changed but it wasn’t working anymore…).

Still, it does not work and we’re in the process of understanding why. Sorry @baubin for the trouble…

MaximeJ · March 12, 2025, 4:50pm

Ok, I think I understood the issue, although I don’t know how to fix it yet.

When the CI retrieves the orekit-data on your fork @Zudo, it should take the latest commit, which is:
“22bc73f8: Update to fit the repository of Zudo”
That’s what happens on the CI on OreCzml develop branch and what works on master (at least locally in my IDE).

Instead, it is taking the commit before (see the logs):
“cff67384: Updated to December 2024.”

I think that’s why the CI fails, on “master” and on your fork @baubin.
Now we need to understand why Git is doing that…

baubin · March 12, 2025, 4:51pm

@MaximeJ Is master having the same trouble I was with numeric accuracy? If so, then the regex reader I am finishing up right now should fix it. I have been testing it on the CollisionTest in the develop branch and that test is now passing for me where it was previously failing.

Updated to add: Yep, just checked the current test logs for master. The test output for CollisionTest is now equal to my test output to machine precision (compare these values to the ones posted above).

18000,-0.10744865108967541,0.13642583382228363,-0.29590172652316,0.9393002435246768

baubin · March 13, 2025, 5:01am

Update: I wrote a branch that separates numbers out of the Template file outputs and compares them to a user-given number of significant figures instead of exact machine precision, and this branch is passing the unit tests. But now I’m having issues with the gitlab-runner refusing to upload build artifacts because it says the “entity is too large”

I’m afraid when it comes to building CI/CD pipelines I know nothing - it’s an aspect of the job I’ve never dealt with. I’ll be digging into what’s going on of course, but I figured if anyone here knew the answer off the top of their head it was worth asking.

MaximeJ · March 13, 2025, 3:01pm

Hello @baubin,

Sorry for the inconvenience again

And thank you for your contribution to the test suite!
My colleagues who are expert in CI/CD are not available this end of week but I’ll make sure to ask them next week and we’ll investigate all these issues.

Cheers,
Maxime

baubin · March 13, 2025, 11:16pm

@MaximeJ So I got some time today to look into it and it turns out that the reason I couldn’t find most of the gitlab runner settings for the project was because they were on another, remote runner file being referenced by the runner file inside the project proper. I solved it temporarily (mostly as a test) by making a local copy of this runner and then setting the artifacts to only build on failure:

In the local .gitlab-ci.yml:

include:
  - local: "gitlab-ci-maven-test.yml" # <- Temp fix to solve artifact download issue
  # remote: "https://gitlab.com/to-be-continuous/maven/-/raw/3.9.0/templates/gitlab-ci-maven.yml"

In my new, local gitlab-ci-maven.yml:

  artifacts:
    name: "$CI_JOB_NAME artifacts from $CI_PROJECT_NAME on $CI_COMMIT_REF_SLUG"
    expire_in: 1 day
    when: on_failure #(used to be "always")
    reports:
      junit:
        - "${MAVEN_PROJECT_DIR}/**/target/*-reports/TEST-*.xml"
    paths:
      # version may have been altered
      - pom.xml
      - "${MAVEN_PROJECT_DIR}/**/target"

I tried creating a variable in .gitlab-ci.yml that would control the always \ on_failure \ on_success value, but didn’t manage it. I’m pretty sure that’s just because I’m ignorant of the right way to do it though.

MaximeJ · March 26, 2025, 2:53pm

Hi @baubin,

Sorry for the delay.
I think I fixed your “entity too large” issue. It was because OreCzml artifacts are 101Mo large and the default limit on the forks is set to 100Mo…
I increased your cache allocation to 200Mo so now it should work fine.

I’ve run a pipeline on develop directly on your fork and it passes the mvn-build job!
But then you have some quality issues ;), (see this job).

Also I managed to have master branch build without failure (see this run on my fork for example).
It was due to the fact that the orekit-data submodule was not up to date with the latest commit on Julien’s fork.
I’ll push the fix asap

Cheers,
And sorry again for the difficulties!
Maxime