2.4.2 Legendre transformation

Consider a function
, whose argument we denote by
(the
curve in Figure 2.10 is a possible graph of
).
For simplicity we are considering the scalar case, but
the extension to
is straightforward. The
*Legendre transform* of
will be a new function,
, of a new
variable,
.

Let be given. Draw a line through the origin with slope . Take a point at which the (directed) vertical distance from the graph of to this line is maximized:

(Note that may not exist, so the domain of is not known a priori. Also, is not necessarily unique unless is a strictly convex function.) Now, define to be this maximal value of the gap between and :

We can also write this definition more symmetrically as

where and are related via (2.34). When is differentiable, the maximization condition (2.34) implies that the derivative of with respect to must equal 0 at :

Geometrically, the tangent line to the graph of at must have slope , i.e., it must be parallel to the original line through the origin (see Figure 2.10). If is convex then (2.34) and (2.37) are equivalent.

The Legendre transformation has some nice properties. For example, is a convex function even if is not convex. The reason is that is a pointwise maximum of functions that are affine in , as is clear from (2.35). Also, for convex functions the Legendre transformation is involutive: if is convex, then .

Now let us return to the Hamiltonian defined in (2.29). We claim that it can be obtained by applying the Legendre transformation to the Lagrangian . More precisely, for arbitrary fixed and let us consider as a function of . The relation (2.37) between and becomes

which corresponds to our earlier definition (2.28) of the momentum . Next, (2.35) gives

which is essentially our earlier definition (2.29) of the Hamiltonian . But there is a difference: in (2.29) we had as an independent argument of , while in (2.39) is a dependent variable expressed in terms of by the implicit relation (2.38). In other words, the Legendre transform of as a function of (with fixed) is , which is a function of (with fixed) and no longer has as an argument. Note that the above derivation is formal, i.e., we are ignoring the question of whether or not (2.38) can indeed be solved for . This issue did not arise earlier when we were working with the Hamiltonian .

The above approach has another, more important drawback. Recall the observation based on (2.31) that has a stationary point as a function of along an optimal curve. This property will be crucial later; combined with the canonical equations (2.30), it will lead us to the maximum principle. But it only makes sense when we treat as an independent variable in the definition of . On the other hand, Hamilton and other 19th century mathematicians did not write the Hamiltonian in this way; they followed the convention of viewing as a dependent variable defined implicitly by (2.38). This is probably why it was not until the late 1950s that the maximum principle was discovered.