CdmParser KVN Enhancements

Hi,

I’ve seen some CDMs that Orekit’s CdmParser is not able to parse.

In one case there is not value between the number and the units, e.g.

CTDOT_TDOT     =0.0001[m**2/s**2]

I know the CCSDS standard in Section 6.3.3 says “there must be at least one blank character between the value and the units”, but in this case it shouldn’t be to hard to separate them. Note that Orekit’s own KvnGenerator will generate files like this with no space between the value and units if the units and alignment columns are too close.

I’ve also seen lines like the following with just a bunch of blanks after the =.

TIME_LASTOB_START     =          

I don’t see in the CCSDS standard where it says the key shall not appear when an optional value is not present. So I think this one could be considered standard conforming.

Any issues if I update the CDM parser to be able to accept these kinds of lines in a file?

I think we all agree on the fact we should be lenient in accepting slightly malformed files (as long as the intent is obvious), so go ahead with a fix.
And we should also fix our own generation to insert the missing space in the files we write ourselves!

2 Likes

Ran into an issue where OCM has a field:

SOLVE_STATES = POS[3], VEL[3]

which under the new relaxed parsing will treat the last “[3]” as the units specification.

There is also the ambiguous case where it’s not clear whether a string in brackets “[d]” is the units or the value. E.g.

ACTUAL_OD_SPAN = [d]

These cases could be resolved if the lexical analyzer knew which keys allow units. E.g. SOLVE_STATES doesn’t have units, so the [3] is part of the value, ACTUAL_OD_SPAN does have units so the [d] is the units. But that would seem to break the nice separation of concerns in the current implementation between the lexical analyzer, which deals with text, and the Parser classes, which deal with the semantics. Not to mention it would be a larger effort.

What are your thoughts @luc on a path forward?

Perhaps restrict the units regex to require at least one letter, percent, degree, arc minute, or arc second symbol? That would fix the issue in the unit test, but could be confused by other free text fields.

Or perhaps keep the existing behavior in the lexical analyzer and create filters that replace a ParseToken(value="1[d]") with ParseToken(value="1", units="[d]"), and similarly for the other issue. It would be opt-in so it wouldn’t break the existing code, but would still suffer the same issues when it comes to resolving ambiguous cases.

In fact, a single integer is an acceptable unit. This is because there are some formats that have units like 1/s. ParserTest.testOne() just tests that “1” can be parsed as a unit.

I have to think more about this.