CssiSpaceWeatherData throwing weird exception during OD with parallel propagation

bruno · July 4, 2023, 9:44am

Hi everyone,

I am doing OD with UKF, using a numerical propagator including NRLMSISE00 model with CssiSpaceWeatherData (Orekit v11.3.2).
Unfortunately, while doing the estimation on some measurements, UKF throws an OrekitException issued from CssiSpaceWeatherData:171 with OrekitMessages.OUT_OF_RANGE_EPHEMERIDES_DATE.
example: out of range date for ephemerides: 2023-04-10T04:55:02.02346291209888Z, [1957-10-01T00:00:00.000Z, 2044-06-01T00:00:00.000Z]

The weird part is that :

while debugging, the parameters values leading to the exception should not do that ! By the way, CssiSpaceWeatherData: lines 166 to 173 are marked as dead code
Most importantly: if I modify UnscentedKalmanModel: line 474-475 to do sequential propagations instead of parallel propagations, the problem vanishes.

So my questions are :

HAve someone already encountered the pb ?
Might it be a threadsafe problem in the CssiSpaceWeatherData model, or the atmosphere model ?

Thanks for your time

bcazabonne · July 4, 2023, 9:50am

Hi @bruno

It is a known issue: Potential thread interference using CssiSpaceWeatherData (#1072) · Issues · Orekit / Orekit · GitLab
Unfortunately, we didn’t fixed it yet.

If you have an account in the Gitlab repository, could you clic on the buttom in order to know that you also encountered this issue?

Thank you,
Bryan

bruno · July 4, 2023, 11:29am

Thanks @bcazabonne !
And sorry I didn’t notice the issue in the gitlab, I had only searched for the issue in the forum. Unfortunately I don’t have any account in the gitlab yet. I can also confirm the pb seems to arise also with the MSAFE monthly solar activity data, though not throwing the same kind of exception.

Paul1 · July 18, 2023, 8:48am

Hello folks,

Just to add that I am also encountering this problem when using BLS along with CSSI & NRLMSISE00 / DTM2000. I am running multiple python threads each with an instance of Orekit running inside. I am not a Java expert so not sure if I have made an error in how I am configuring my parallelisation - I thought that they would be fully independent but perhaps that isn’t how Java works?

Anyway, just wanted to report the issue as being wider spread while I try and have a look at fixing it.

Thanks,
Paul

MaximeJ · July 18, 2023, 11:28am

Hi @Paul1,

This looks indeed like the same issue.
In your case, I think you can fix the error by using a different DataContext for each thread (here’s a tutorial on how to use multiple DataContext).
I haven’t tried it but I’m interested in knowing if it works.

It cannot be avoided the same way with the UKF because you don’t have control on the force models that are used in the PropagatorParallelizer internally.

Paul1 · July 18, 2023, 3:24pm

Hi @MaximeJ ,

Thanks for getting back to me on this one. I have had a go at using the additional DataContext as:

    if os.path.isdir(filename):
        crawler = DirectoryCrawler(datafile)
    elif os.path.isfile(filename):
        crawler = ZipJarCrawler(datafile)
    else:
        print('filename ', filename, ' is neither a file nor a folder')

    data_context = LazyLoadedDataContext()
    data_context.getDataProvidersManager().addProvider(crawler)

This code works OK and Orekit seems to initialise properly (I don’t get any uninitialised warnings). Is the intention then for me to use the new data_context object throughout my code to access things such as frames, weather data etc? At present I use functions such as FramesFactory for supplying these objects and when using the new initialisation method I find errors such as

ecef = FramesFactory.getITRF(IERSConventions.IERS_2010, False)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - orekit.JavaError: <super: <class 'JavaError'>, <JavaError object>>
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO -     Java stacktrace:
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - org.orekit.errors.OrekitException: no IERS UTC-TAI history data loaded
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.time.LazyLoadedTimeScales.getUTC(LazyLoadedTimeScales.java:188)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.AbstractEopLoader.getUtc(AbstractEopLoader.java:56)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.BulletinBFilesLoader.fillHistory(BulletinBFilesLoader.java:238)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.LazyLoadedEop.getEOPHistory(LazyLoadedEop.java:311)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.LazyLoadedFrames.getEOPHistory(LazyLoadedFrames.java:182)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.AbstractFrames.getCIRF(AbstractFrames.java:400)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.AbstractFrames.getTIRF(AbstractFrames.java:353)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.AbstractFrames.getITRF(AbstractFrames.java:278)
[2023-07-18, 15:08:55 UTC] {process_utils.py:189} INFO - 	at org.orekit.frames.FramesFactory.getITRF(FramesFactory.java:415)

which I would expect if the data had not been loaded. I suspect this is because these factories use the default data context to obtain weather/eop coefficients? Is there a way of setting this new context as the one to be used internally without reverting to the previous problems I was seeing?

Thanks for your help,
Paul

MaximeJ · July 18, 2023, 3:41pm

Yes the idea is to use a different data context for each of your thread. That way I think you won’t have a concurrent access to the CSSI Space Weather Data stored in memory.

Yes instead of the default factories you need to use (assuming context is one of your DataContext):

context.getTimeScales() for TimeScalesFactory
context.getFrames() for FramesFactory
etc.

Maxime

Paul1 · July 19, 2023, 2:38pm

Thanks, I am considering giving this a go and I will let you know any outcomes if I do go down this route. I am a little reluctant to do it though as it means quite wide sweeping changes to the codebase to pass around the new data context object when creating frames etc. It would be nice if there was a way to replace the defaults factories with the new data context without that being the default for all other threads as well.

While I ponder how best to modify my codebase, I have been trying to understand the cause of the problem a little more and have come up against a wall. I have run successfully in parallel in the past using Orekit with mpi4py so wanted to understand the differences between that and my current method (Celery) and as far as I can tell there aren’t any. Each Orekit run is being created in a separate python Process and I have checked that a new JVM is being created for each Process so as far as I can tell there shouldn’t be any shared memory space for interaction between the runs to cause any concurrency issues via static class members etc etc. Perhaps there is an interaction method that I am missing though (not a Java expert, especially not JNI)?

Thanks!

Vincent · August 16, 2023, 2:09pm

Hi everyone,

I’m on it, i’ll get back to you as soon as this is done .

Cheers,
Vincent