Adding Japanese translations?

Hi,

I recently added missing Spanish translations, and saw that there is currently no Japanese messages available. I could easily add these as well, but was wondering first if this would be something the team is OK with?

Since these would be in CJK characters, I’m not sure if adding them would break something?

On the other hand, I saw on this thread that having more language supports might become annoying for developers.

But if this is not an issue, I’m definitely happy to contribute these :slight_smile:

1 Like

That would be great! (but obviously this is only my opinion).
As our messages translation files use UTF-8 encoding, I think CJK characters are supported.

1 Like

Hi Rafa,

as you seem aware, currently the dev burden of maintaining translations is high, so I am strongly against adding any new supported langage (at least in the current state of things).

Beyond that, the question I have already raised is: with all the translation tools here and there, do we really need Orekit to natively support anything beyond English?

Cheers,
Romain.

1 Like

Dear Luc and Romain,

Thanks a lot for your replies.

I understand the burden put on developers to maintain translations.

At the same time, I personally think it can be a helpful addition for some users (more of a commodity than a real need I guess).

I saw on the other thread that a possibility would be to set up something to generate the translation resource files automatically from OrekitMessages. Maybe doing this would be the best way to go about it? I’m happy to give this a shot!

But just to clarify, what would be the ideal solution? A script for internal usage by developers that takes the master OrekitMessages file and the current translations resource files, and updates/fixes the resource files (e.g., order, adding new messages with default values)? This sounds quite easy to do.

Or a setup where the resource files would be automatically generated from OrekitMessages at build time? This seems slightly more complex, but from what I read, it should not be too hard to do with some maven plugin (maybe this one together with some template class for the resource files and a list of supported languages/locales?).

Best wishes,

Rafa

I don’t know exactly how such a platform works and how it can be integrated into Orekit development, but many open source projects already use a translation management platform such as Weblate, poeditor, transifex or Tolgee. It may be that one of these platforms can help us maintain the Orekit translations.

Here are some examples:

Note: Weblate and Toglee are open core platforms, transifex and poeditor are proprietary platforms.

1 Like

Hi,

Short of setting up any of those platforms, which definitely look like much better solutions, I was playing a bit with setting up a way to automatically process the translation files.

I’ve got a working prototype, very dirty implementation at this point, but thought I’d share it here in case it is somehow useful. Basically, I aimed to perform the following.

First, we parse OrekitMessages.java (which, as far as I understand, would be the master file where developers would be placing new messages) to get the list of currently defined messages, with their keys and message string for each.

Then, we perform the following actions to generate/modify resource files at build time.

First, we create the corresponding resource file with the English messages. Since this file does not involve any translation (English messages are already defined in OrekitMessages.java), it seems it would never require any manual intervention. From what I see in the translation resource files, they follow the pattern of having 3 lines for each message:

  1. Line 1 is a hash character, followed by a single whitespace, followed by the unformatted error string.
  2. Line 2 is the error key - unformatted error string pair, separated by the = character
  3. Line 3 is empty, i.e., just a newline character.

So we can easily write this file, having the messages in the same order as found in OrekitMessages.java

Then, for each of the supported languages, we do the following:

  • If there is not already available a corresponding resource file before performing these build-time modifications, then the corresponding resource file will simply be written as exactly the same as the english resource file, but for each error key, in line 2, after the = symbol, instead of writing the unformatted error string, we will just write <MISSING TRANSLATION>
  • However, if there is such a file already available, we parse it to obtain the available translated strings for each present error message. From this step, we will get some error keys for which a translation was available, and some others for which it wasnt. The latter will be those that had the value <MISSING TRANSLATION> after the = symbol. We then write a new resource file such that it contains the exact same set of errors as the English resource file, but for keys that had a translation available, we write this translation as the unformatted message string after the = symbol. For keys that either were not present in the already available translation resource file, or were present but had a value of <MISSING TRANSLATION>, we write <MISSING TRANSLATION> as their value in the newly written translation resource file. This has also the advantage that all translation files will contain all the errors in the same order, which will match the order in OrekitMessages.java.

Having this, we then also need to fix the fact that in OrekitMessagesTest.java, we have a test for the number of messages, with the expected number hardcoded (currently at 315). I’ve tried to handle this by also writing at build time a properties file with the number of messages obtained from parsing OrekitMessages.java, and then OrekitMessagesTest.java reads the expected value from this properties file.

All of the manipulations are handled from a new class, TranslationTemplateGenerator.java, which is called at build time by the exec-maven-plugin. So I had to modify accordingly pom.xml

With this setup, there is no modification required to any of the tests (except the one for the number of messages).

I still have quite a bit of checkstyle work and making code nicer, but since I got a first draft passing tests, I thought I’d share it here. I’ve put it in this draft merge request

I hope it can be somehow useful! At the very least, it was a very fun learning experience

Thank you very much, @Rafa, for your thoughts on translation management and your contribution.

As it stands, your code breaks the CI/CD pipeline and needs to be fixed.

But to be honest, if the Orekit project gets tools to facilitate and better manage translations, I’d rather use mature, proven tools than an ad hoc solution.

I have never deployed or used a translation management platform, but a friend contributes to the translation of several open source projects, which leads him to use several of his platforms. This gives him no knowledge of the difficulty and impact of their deployment, but he will have an opinion on their ergonomics, their strengths, and weaknesses. I will consult him.

In any case, if we can work on the subject, for the moment, it can only be a proof of concept. The other contributors will have to be consulted before any decision is made.

1 Like

I know, I am old school and have a tendency to redo things already done elsewhere…
I like @Rafa solution and the fact it is standalone.

1 Like

I agree with @Serrof remark that maintaining translation files is a bit annoying…

On a first step, I appreciate @Rafa solution because it will remove the annoying part. I just have one question: How translator will see the missing translations with this new system?

But for a long term visions, I appreciate the idea of an external tool already used by other open source softwares. However, we shall adress the following questions to limit the impact for our users:

  • Is it a dependency that we shall add in the pom?
  • Is it something mandatory to install when using Orekit?
    The less the impact for our user, the better for maintenability.

Can we include Rafa contribution for 13.0 and open an issue for a next version to think about another tool?

Bryan

2 Likes

Hi all,

Since this is a dev feature (I’m talking about the automation of translations) and not a user oriented one, I don’t think we need to include it in the 13.0 release. It can be done later whilst taking our time.

Cheers
Romain.

2 Likes

I’m afraid you’re missing the point: a translator is not a developer. Mastery of Git and its workflow, interaction with a forge, are certainly the main obstacles to contributing. Just like a developer, a translator must be provided with tools adapted to his needs, which allow him to focus on the added value he can bring to the project, without having to bother with concepts and processes that are beyond him.

I had a long discussion with my friend this afternoon, and he provided me with many interesting arguments in favor of such a platform:

  • He has a clear preference for Weblate, which he finds more user-friendly than the others. I was delighted to hear this, since Weblate is an open source platform. :slight_smile:
  • I didn’t know it, but he did much more than use these different platforms. He also deployed a self-hosted instance of Weblate for an open source project. He told me it was easy.
  • He explained to me that Weblate could, under certain conditions, retrieve the messages in the source code itself and update the list of messages to be translated (because they are new or have been modified).
  • When using the GitHub platform, Weblate can create a pull request on the repository when new translations are available. It’s great, but I don’t know if it works with GitLab too.

I will try to find some time to set up a self-hosted instance and see how to use it for Orekit.

You know that I don’t know Java. So this exchange gave me the opportunity to read up on best practices for managing message localization in a Java application. The most common practice seems to be to use “properties” files:

Things are not done that way in Orekit. This reflection may be an opportunity to change approach and adopt a de facto standard. :slight_smile:

But I don’t want to scare you. So let’s just say that for the moment they are two different subjects. :wink:

Just a small comment on this, the way I currently envisioned it, translators would basically see them the same way they see them now, since the files automatically generated would contain the missing translation lines as well

We use properties for translations in both Hipparchus and Orekit. The translation files are indeed properties files and they are loaded using the standard resource bundle feature (this is provided by Hipparchus here).

This use of properties is wrapped within a use of enumerates because this allows to retrieve easily the reason for exceptions and in some case fix the transient error during the run (see here for an example where we handle the specific case of a singular matrix).

1 Like

Proof of concept in preparation… It smells very good. :slight_smile:

2 Likes

I’ve played around quite a bit with Weblate, and I’m starting to understand a little better how the tool should be configured, even if two or three aspects are still a little unclear to me (for myself: RTFM).

A MR has been generated on the Orekit project, but at this stage it is far too premature. It’s a mistake on my part (the kind of thing that happens when you tinker and test).

I’ll close this MR, and we’ll forget about it. I’ll redo the fixes later.

Before I start from scratch in Weblate and open it up to everyone, I’m going to submit a change to the formatting of the properties files. Weblate will do it on its own, but that will drown out the real fixes made by the translators, and you won’t be able to spot and validate them.

2 Likes

Thank you @sdinot for the effort. It looks promising!