Comments/questions on DerivativeStructure

Serrof · June 3, 2022, 9:48am

Dear Hipparchus community,

I am fond of high-order, multivariate automatic differentiation and first of all, I would like to thank the developers for providing an open-source implementation of it in Java via the DerivativeStructure class, even though the lack of operator overload in this language makes its use more tedious. Actually, the real achievement was to systematically code methods in Orekit both with doubles and DerivativeStructures, so kudos on that as well.

Before seeing it in the Javadoc, I wasn’t aware of Dan Kalman’s paper. So I’ve read it and… my conclusion is that Hipparchus’ DerivativeStructure class is far from being a textbook implementation of it. To be honest, it’s just borrowing a few ideas. Indeed, the author basically proposes to perform ALL operations recursively, meaning that DerivativeStructure objects should hold as attributes other instances with less variables or derivatives, until things unfold down to real numbers. Even for the « add » operator. But in Hipparchus, it is implemented as a straight-forward pair-wise additions of corresponding terms, there is no recursion on substructures as the partial derivatives are handled directly. Don’t get me wrong, I’m not saying this is a bad thing: as Dan Kalman admits himself, his concept is elegant as very little code is needed, but it is certainly not the most efficient as far as computational cost is concerned. I’m just saying that maybe Hipparchus’ documentation shouldn’t claim that it implements his ideas, but only that it is merely inspired from it. In my opinion, the present DerivativeStructure class is far closer to Berz’s seminal work in the 80s-90s (please share your thoughts if you disagree), see for example:

BERZ, A. Differential algebraic description of beam dynamics to very high orders. Part. Accel., 1988, vol. 24, no SSC-152, p. 109-124.

Now, to my second point. It is my understanding that the « compose » function in DSCompiler only allows to combine derivatives up to order N from a univariate function with the derivatives of an arbitrary multivariate function up to order N. Is that correct ? If so, I must say that it is not very clear from the Javadoc and I would suggest reformulating. By the way, the use of that method in Hipparchus for the intrinsic functions (e.g. exp) is again far from what Dan Kalman proposes on the subject in his paper (and which might be his best contribution)!
Anyhow, the fact is that sets of partial derivatives can be combined as long as they have the same maximum order and that the number of variables of the left hand side equals the number of functions on the right hand side. What I’m saying is that for example f(g,h) makes sense as long as f is bivariate and g and h map the same number of variables to R. But I don’t see in Hipparchus how to do that with DerivativeStructures. As a matter of fact, this is a necessary feature to perform Taylor map inversion (as requested in Issue #190). It implies manipulating different DSCompiler at the same time. It is not trivial yet not too tricky I believe and it would be very appreciable to have that in Hipparchus!

Cheers,
Romain.

luc · June 3, 2022, 2:30pm

I strongly agree to this. This is a major drawback of the Java language in my humble opinion.
Years ago, I tried to alleviate it when I create the Nabla project at the Apache Software Foundation. It never reached production state.

DerivativeStructure operations are based on iterations, but the tables driving these iterations are set up using the recursive methods from Dan Kalman’s paper. The recursions are computed once for each parameters/order pair in DSCompiler.

Yes, this is correct. The order of derivations must be consistent throughout the application of the derivation chain rule. And here again, the implementation using loops is just an optimization of the recursive rules from the paper.

It will be verified at run time, when the f function (which may either by user-defined or provided by Hipparchus) performs operations on g and h, for example if it calls g.add(h). The add method, as well as other combination methods calls the checkCompatibility method from DSFactory that bails if number of parameters or order do not match. There are no enforcement at compile time, so a user wishing to implement a two arguments functions (like myAdd(g, h)) would simply write

T myAdd(T g, T h)) {
    return g.add(h);
}

Here T could be any CalculusFieldElement, not only DerivativeStructure. When called with a field that is a DerivativeStructure, the partial derivatives will be computed, but this code doesn’t need to know it. The same code works for derivation to order 1, 3 or 25 if you want, and it would also work with something that has nothing to do with derivatives.

I am not sure I understand your request. Do you mean that we should have a way to promote/demote order, for example to convert e DerivativeStructure with 5 parameters and order 3 to 5 parameters and order 2? Removing derivatives would be easy. Adding derivatives (says going from order 2 to order 3) could be done with some assumptions, like putting the extra derivatives to 0 (but mathematically it is dubious). Changing the number of parameters is more tricky. We explicitly decided at design phase that we will avoid putting maps in the derivatives that name the parameters for efficiency reasons. When DerivativeStructure objects are used in a complex algorithm, there are really thousands of primitive operations that must all be consistent, so putting maps and checks on each derivative individually was see as too costly.

Serrof · June 7, 2022, 7:05pm

Dear Luc,

thank you for your reply!

About my last point, I am not talking about orders of derivatives. By the way, differentiation/integration w.r.t. particular variables is definitely something that could be benefited from (see the aforementioned reference from Berz, with application for example with the Picard integrator). I was originally planning on opening another discussion about that, but we can talk about it here too! For me, this feature, together with automatic differentiation and the so-called Taylor map inversion are the three pillars of what is known as the Taylor Differential Algebra. It should be pretty simple to implement it with the current DerivativeStructure class: it’s just a reshuffle of the derivatives!

What I meant about composition between DerivativeStructure is the following. If F is a real-valued function of N variables (differentiable at an arbitrary order) and G_1, …, G_N are N real-valued functions of M variables, it is obviously possible to form H=Fo(G_1, …, G_N). The partial derivatives DH^P of H at a point X at order P can be obtained with the chain rule from DG_1^P, … DG_M^P at X and DF^P at (G_1,…,G_N)(X). This is a prerequisite to perform map inversion and it would be cool to have it in Hipparchus. To be honest that’s something I could look into but first you guys are more accustomed to the code and second I’m more used to manipulate Taylor expansions (truncated polynomials in the canonical basis) than the derivatives themselves (they only differ by some factorial terms but that makes a difference).

Cheers,
Romain.

luc · June 11, 2022, 7:27am

Hi Romain,

I think we are not understanding each other. For what I currently understand in your needs, we already support this and have been supporting this since the beginning of DerivativeStructure, so I am confused. I will try to better explain the current design, and hope you can also explain again in other words what you need so I can understand.

Let f=f(x_1, x_2,\ldots x_n) be a differentiable function of n variables. A DerivativeStructure implementation of this function would have a public API like:

  public DerivativeStructure f(DerivativeStructure x1, DerivativeStructure x2, ..., DerivativeStructure xn) {
     // some heavy computation here
    return result;
  }

The heavy computation inside, is a combination of calls to primitive DerivativeStructure operations like addition or subtraction, calls to classical calculus functions like sine or exponential and calls to other user-defined functions which also have similar public APIs. For illustration, I will just consider three arguments and pick up a few random functions:

public DerivativeStructure f(DerivativeStructure x1, DerivativeStructure x2, DerivativeStructure x3) {
  return x1.add(FastMath.exp(x2.divide(h(x3))));
}
public DerivativeStructure h(DerivativeStructure y) {
  return y.multiply(5);
}

In the previous code, neither the number of parameters nor the derivation order are specified, they will depend on the values provided by caller when x1, x2 and x3 are passed from caller to callee.
In fact, there is nothing that is really dependent on DerivativeStructure either, so for the sake of generality, when we develop in both Orekit and Hipparchus, we rather use the generic CalculusFieldElement interface to specify all the types rather than DerivativeStructure when coding such implementations, as we only need to be able to do primitive operations, call calculus functions and call other user-defined functions we also write ourselves. So the Orekit and Hipparchus teams would rather write:

public <T extends CalculusFieldElement<T>> T f(T x1, T x2, T x3) {
  return x1.add(FastMath.exp(x2.divide(h(x3))));
}
public <T extends CalculusFieldElement<T>> T h(T y) {
  return y.multiply(5);
}

This what we call “fieldifying” a computation within the team. Notice that only the types declaration code changes, the computation code remains exactly the same.

With this implementation, the user that would like to have the third order partial derivative \frac{\partial^3 f}{\partial x_1 x_2^2} would call the methods as follows:

  int parameters = 3;
  int order      = 3;
  DSFactory factory = new DSFactory(parameters, order);
  DerivativeStructure x1 = factory.variable(0, x1Value);  // user arbitrarily decided 0 is index for x1
  DerivativeStructure x2 = factory.variable(1, x2Value);  // user arbitrarily decided 1 is index for x2
  DerivativeStructure x3 = factory.variable(2, x3Value);  // user arbitrarily decided 2 is index for x3
  DerivativeStructure  res = f(x1, x2, x3);
  System.out.println(f.getPartialDerivative(1, 2, 0)); // derivatives: order 1 wrt x1, order 2 wrt x2, order 0 wrt x3

And now is what I understand from your use case, and where I may be confused.
We now consider that x_1, x_2, and x_3 are not canonical independent variables anymore, but are rather themselves functions of other variables:

\left\{ \begin{gather} x_1 &= g_1(u, v)\\ x_2 &= g_2(u, v)\\ x_3 &= g_3(u, v) \end{gather} \right.

Here, I considered that there were only two independent variables u and v, but in fact there could be any number. I just used 2 to emphasize the fact the number of variables in g_1, g_2 and g_3 is not related to the number of variables in f.
So say that at the end, the user does not want \frac{\partial^3 f}{\partial x_1 x_2^2} as before, but rather \frac{\partial^3 f}{\partial u v^2}. One way (clearly not the one I would recommend), would be to compute \frac{\partial^3 f}{\partial x_1 x_2^2} on one side as before, to compute the various partial derivatives of the g_1, g_2 and g_3 functions on the other side and to combine everything. However, there is no need to do this. The partial derivatives combinations are already available and no code change is required in either the f or h functions. The only change required is in the user calling code, when he or she sets up the arguments:

  int parameters = 2;
  int order      = 3;
  DSFactory factory = new DSFactory(parameters, order);
  DerivativeStructure u  = factory.variable(0, uValue);  // user arbitrarily decided 0 is index for u
  DerivativeStructure v  = factory.variable(1, vValue);  // user arbitrarily decided 1 is index for v
  DerivativeStructure x1 = g1(u, v);
  DerivativeStructure x2 = g2(u, v);
  DerivativeStructure x3 = g3(u, v);
  DerivativeStructure  res = f(x1, x2, x3);
  System.out.println(f.getPartialDerivative(1, 2)); // derivatives: order 1 wrt u, order 2 wrt v

So the f function by itself does combines the partial derivatives of several underlying functions g1, g2 and g3 naturally, without any code change in f. It does not know when it handles its parameters x1, x2 and x3 whereas these parameters are canonical variables or already results from complex preliminary computations with lots of partials already present.

In fact, there is not even a real notion of independent variables in DerivativeStructure. The convenience builder methods constant and variable from DSFactory just create structures that are akin to a constant or a canonical variable just because for a constant there is one value and all partial derivatives are zero (hence it is the evaluation of a constant function) and for a variable there is one value and all partial derivatives are zero except the first derivative at the index selected by user for this variable, this derivative being set to 1 (hence it is the evaluation of an identity function for the index selected by user). If the f and h functions are written in terms of CalculusFieldElement, the f and h functions not only ignore if the parameters are canonical variables or complex results, they are even not aware that what they compute are derivatives!

So if I understand your use case which to me seems to be combining the partial derivatives in f and h with the partial derivatives in g1, g2 and g3, Hipparchus already does this.

Serrof · June 14, 2022, 12:24pm

Hi Luc,

thanks for you message. Indeed I think we are not understanding each other^^
But no worries, it’s hard to speak of such technical stuff via keyboards, although I just realized one can use latex-like typing here (that’s awesome).
I am familiar with the implementation you are talking about. I do appreciate that you guys made it very generic with the CalculusFieldElement field (I am less accustomed to the technicalities of this element from Java 8, but I believe it is very similar to C++ templates so I get it).

To go back to the subject at hand, I am talking about composing functions that are only known via their partial derivatives. No explicit or implicit definitions are available (so certainly, no algebraic expressions): we just have instances of DerivativeStructure. A particular example of that, which is already available in Hipparchus indeed via “compose”, is when the left hand side of the composition is a scalar univariate function. So what I’m referring to is a generalization, when this left-hand side is multivariate and so the right-hand side is an array of DerivativeStructures.

Let’s try to be a bit more concrete. Let f be any twice-differentiable (for example), bivariate scalar function only known via its partial derivatives at \mathbf{x}=(x_1, x_2):
\partial f / \partial x_1(\mathbf{x}) , \partial f / \partial x_2(\mathbf{x}) , \partial^2 f / \partial x_1^2 (\mathbf{x}), \partial^2 f / \partial x_1\partial x_2(\mathbf{x}) , \partial^2 f / \partial x_2^2 (\mathbf{x}).
Let g and h be any twice-differentiable, scalar functions with the same number of variables N. Let us assume that their partial derivatives are known around \mathbf{y}=(y_1, ..., y_N) such that (g,h)(\mathbf{y})=\mathbf{x} . From now one I’m going to omit the evaluation points \mathbf{x} and \mathbf{y} to have shorter formulas.
One has (hopefully there is no mistake):
\partial fo(g,h) / \partial y_1 = \partial g / \partial y_1 * \partial f / \partial x_1 + \partial h / \partial y_1 * \partial f / \partial x_2
\partial fo(g,h) / \partial y_2 = \partial g / \partial y_2 * \partial f / \partial x_1 + \partial h / \partial y_2 * \partial f / \partial x_2
\partial^2 fo(g,h) / \partial y_1^2 = \partial^2 g / \partial y_1^2 * \partial f / \partial x_1 + \partial g / \partial y_1 *(\partial g / \partial y_1 * \partial^2 f / \partial x_1^2 + \partial h / \partial y_1 * \partial^2 f / \partial x_1 \partial x_2 ) + \\ \partial h / \partial y_1 * (\partial g / \partial y_1 * \partial^2 f / \partial x_1 \partial x_2 + \partial h / \partial y_1 * \partial^2 f / \partial x_2^2 ) + \partial^2 h / \partial y_1^2 * \partial f / \partial x_2
And so on for all the derivatives concerned, so that the DerivativeStructure of fo(g,h) is fully known. This can be implemented, based only on the “data” attribute of the DerivativeStructure (and some precomputed stuff from DSCompiler). At no moment do we need to know what f, g or h are exactly, we’re only manipulating sets of partial derivatives.

I hope it’s clearer now.

Cheers,
Romain.

luc · June 14, 2022, 12:33pm

I think I finally understood.
I will try do come up with something
Could you open a feature request in the issue tracker? For Hipparchus, it is in the github site.

Serrof · June 15, 2022, 8:59am

Done! Thanks Luc.
Should I also create a feature request for the differentiation/integration of DerivativeStructures?
I could tackle that one myself.

Cheers,
Romain.

luc · June 15, 2022, 9:39am

Yes, if you want. Here again, I’m not sure to understand what you want, so if you can do it by yourself, it would be simpler to me (and also I already have a lot on my plate).