Lagrangian Neural Networks, Hacker News

Abstract. Accurate models of the world are built on notions of its underlying symmetries. In physics, these symmetries correspond to conservation laws, such as for energy and momentum. But neural network models struggle to learn these symmetries. To address this shortcoming, last year I introduced a class of models called Hamiltonian Neural Networks (HNNs) that can learn these invariant quantities directly from (pixel) data. In this project, some friends and I are going to introduce a complimentary class of models called Lagrangian Neural Networks (LNNs). These models are able to learn Lagrangian functions straight from data. They’re cool because like HNNs they can learn exact conservation laws, but unlike HNNs they don’t require canonical coordinates.

Figure 1 : A Lagrangian Neural Network learns the Lagrangian of a double pendulum. In this post, we introduce Lagrangian Neural Networks (LNNs). Like Hamiltonian Neural Networks, they can learn arbitrary conservation laws. In some cases they are better since they do not require canonical coordinates.

“A scientific poem”

Joseph-Louis Lagrange must have known that life is short. He was born to a family of children and only two of them survived to adulthood. Then he spent his adult years in Paris, living through the Reign of Terror and losing some of his closest friends to the guillotine . Sometimes I wonder if these hardships made him more sensitive to the world’s ephemeral beauty, and more determined to make the most of his short time here.

Indeed, his path into research was notable for its passion and suddenness. Until the age of , Lagrange was a normal youth who planned to become a lawyer and showed no particular interest in mathematics. But all of that changed when he read an inspiring memoir by Edmund Halley and decided to embark on an obsessive course of self-study in mathematics. A mere two years later he published the principle of least action.

“I will deduce the complete mechanics of solid and fluid bodies using the principle of least action.” – Joseph-Louis Lagrange, age

A French stamp commemorating Lagrange.

Lagrange’s work was notable for its purity and beauty, especially in contrast to the chaotic and broken times that he lived through. Expressing admiration for the principle of least action, William Hamilton once called it “a scientific poem”

. In the following sections, I’ll introduce you to this “scientific poem” and then use it to derive Lagrangian Neural Networks.

The Principle of Least Action

The Action. Start with any physical system that has coordinates (x_t=(q, dot q) ). For example, we might describe a double pendulum using the angles of its arms and their respective angular velocities. Now, one simple observation is that these coordinates must start in one state (x_0 ) and end up in another, (x_1 ). There are many paths that these coordinates might take as they pass from (x_0 ) to (x_1 ), and we can associate each of these paths with a scalar value (S ) called “the action.” Lagrangian mechanics tells us that the action is related to kinetic and potential energy, (T ) and (V ), by a functional

At first glance, (S ) seems like an arbitrary combination of energies. But it has one remarkable property. It turns out that for all possible paths between (x_0 ) and (x_1 ), there is only one path that gives a stationary value of (S ). Moreover, that path is the one that nature always takes.

Figure 3: Possible paths from q0 to q1, plotted in configuration space The action is stationary (δS=0) for small perturbations (δq) to the path that the system actually takes (red).

The Euler-Lagrange equation. In order to “deduce the complete mechanics of solid and fluid bodies,” all Lagrange had to do was constrain every path to be a stationary point in (S ). The modern principle of least action looks very similar: we let ( mathcal {L} equiv T – V ) (this is called the Lagrangian), and then write the constraint as ( frac {d} {dt} frac { partial mathcal {L}} { partial dot q_j}= frac { partial mathcal {L}} { partial q_j} ). Physicists call this constraint equation the Euler-Lagrange equation .

When you first encounter it, the principle of least action can seem abstract and impractical. But it can be quite easy to apply in practice. Consider, for example, a single particle with mass (m ), position (q ), and potential energy (V (q) ):

Nature’s cost function. As a physicist who now does machine learning, I can’t help but think of (S ) as Nature’s cost function. After all, it is a scalar quantity for which Nature finds a stationary point, usually a minimum, in order to generate the dynamics of the entire universe. The analogy gets even more interesting at small spatial scales, where quantum wavefunctions can be interpreted as Nature’s way of exploring multiple paths that are all very close to the path of stationary action. (1)

How we usually solve Lagrangians

Ever since Lagrange introduced the notion of stationary action, physicists have followed a simple formula:

(Find analytic expressions for kinetic and potential energy Write down the Lagrangian

But these analytic solutions are rather crude approximations. of the real world. An alternative approach is to assume that the Lagrangian is an arbitrarily complicated function – a black box that does not permit analytical solutions. When this is the case, we must give up all hope of writing the Lagrangian out by hand. However, there is still a chance that we can parameterize it with a neural network and learn it straight from data. That is the main contribution of our recent paper.

How to Learn Lagrangians

The process of learning a Lagrangian differs from the traditional approach, but it also involves four basic steps:

(Obtain data from a physical system) Parameterize the Lagrangian with a neural network ( ( mathcal {L} equiv mathcal {L} _ { theta} )).

Backpropagate through the constraint to train a parametric model that approximates the true Lagrangian

The first two steps are fairly straightforward, and we’ll see that automatic differentiation makes the fourth pretty painless. So let’s focus on step 3: applying the Euler-Lagrange constraint. Our angle of attack will be to write down the constraint equation, treat ( mathcal {L} ) as a differentiable blackbox function, and see whether we can still obtain dynamics:

For a given set of coordinates (x_t=(q_t, dot q_t) ), we now have a method for calculating ( dot x_t=( dot q_t, ddot q_t) ) from a blackbox Lagrangian. We can integrate this quantity to obtain the dynamics of the system. And in the same manner as Hamiltonian Neural Networks, we can learn ( mathcal {L _ { theta}} ) by differentiating the MSE loss between ( dot x_t ^ { mathcal {L _ { theta}}} ) and ( dot x_t ^ { textrm {true}} ).

Implementation. If you look closely at Equation 8, you may notice that it involves both the Hessian and the gradient of a neural network during the forward pass of the LNN. This is not a trivial operation, but modern automatic differentiation makes things surprisingly smooth. Written in JAX , Equation 8 is just a few lines of code:

(q_tt) (=)

 (  jax   (numpy)   .  (linalg) . (pinv)   ()  (jax)   hessian   (  (lagrangian) , (1) ) ( (q) ,  (q_t)   )  @  (  jax   (grad)   (  (lagrangian) ,  (0)  ) (  (q) ,   q_t  )   -  (jax)  jacfwd   (jax) .   (grad) ( (lagrangian)  ,  (1)  (),  (0)  ( (q)  , (q_t) )   (@   (q_t)

Learning Real Lagrangians

In our paper, we conduct several experiments to validate this approach. In the first, we show that Lagrangian Neural Networks can learn the dynamics of a double pendulum.

Double pendulum. The double pendulum is a dynamics problem that regular neural networks struggle to fit because they have no prior for conserving the total energy of the system. It is also a problem where HNNs struggle, since the canonical coordinates of the system are not trivial to compute (see equations 1 and 2 of