How can I find sneaky Java memory leaks using the Python wrapper?

I have a large Orekit service performing orbit determination. The service is run once to start-up and enters a while True loop where it will idle and listen for observations every X minutes. It uses these observations to estimate new orbits for the observed objects.

The service runs okay for about 24h until I start bumping into performance issues. Then I start getting Java errors:

  • [2024-06-02T15:07:53] java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects

  • [2024-06-02T17:38:25] java.lang.OutOfMemoryError: Java heap space

This final error occured the following day, at the same time I killed the service (assumingly) and tried to build a new version:

  • [2024-06-03T13:12:02] OpenJDK 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated

The JRE is initialised with:

  • orekit.initVM(initialheap='2048m')

I upped the heap size by 2x but it did not fix anything. It just took longer for the memory issues to start occurring.

The memory leaks persist even if the service sleeps for 30mins at a time and only performs estimation every 30mins.

This must be some issue with creation of Java objects and poor garbage collection.

I’m new to Orekit, and Java. I don’t know where or how to start looking for memory leaks. Since my service never shuts-down, this is a particularly significant memory management challenge.

I would really appreciate any guidance or leads on where I can look next, or if you guys know of any common memory issues with the Python wrapper :slight_smile:

Hi @jamesc,

Isn’t there a memory profiling tool that you could use in Python to find out where the problem may come from?

Are you using a Kalman filter? How many measurements and objects are you treating before the service crashes?
If you could reproduce the bug in one run (without using the idle mechanism) and share some sample code (with generated measurements) it would help with debugging.

Cheers,
Maxime

Hi @jamesc ,

You can try to use a WoW tool named VisualVM (https://visualvm.github.io/ ) to analysis the memory of java.

Here is my analysis process of GC overhead limit exceeded error.
https://forum.orekit.org/t/java-lang-outofmemoryerror-gc-overhead-limit-exceeded/1461/4?u=lirw1984

Best,
lirw

Hi,

The garbage handling between java and python is indeed a bit complex, this part is managed by JCC that is the wrapping tool.

What I would recommend first look at is that you have no references to objects left on the python side, such as arrays, pointers or something that could indicate to java that it is not ok to purge the objects. I have experienced this several times of having some array or something that still has a pointer to the objects, then they won’t be purged.

There are likely java tools around that can debug the memory heap, havent tried that recently. It could be worth really “del”-ing objects in python to show that they are ok to go now.