2.6.2 Sufficient condition for a weak minimum

We are now interested in obtaining a second-order sufficient condition for proving optimality of a given test curve . Looking at the expansion (2.56) and recalling our earlier discussions, we know that we want to have for all admissible perturbations, which means having a strict inequality in (2.61). In addition, we need some uniformity to be able to dominate the term. Since we saw that the -dependent term inside the integral in (2.61) is the dominant term in the second variation, it is natural to conjecture--as Legendre did--that having for all should be sufficient for the second variation to be positive definite. Legendre tried to prove this implication using the following clever approach. For every differentiable function we have

where the first equality follows from the constraint . This lets us rewrite the second variation as

Now, the idea is to find a function that makes the integrand on the right-hand side into a perfect square. Clearly, such a needs to satisfy

This is a quadratic differential equation, of

Let us suppose that we found a function satisfying (2.64). Then our second variation can be written as

(the division by is permissible since we are operating under the assumption that ). It is obvious that the right-hand side of (2.65) is nonnegative, but we claim that it is actually positive for every admissible perturbation that is not identically 0. Indeed, if the integral is 0, then . We also know that . But there is only one solution of the first-order differential equation with the zero initial condition, and this solution is . So, it seems that we have for all . At this point we challenge the reader to see a gap in the above argument.

The problem with the foregoing reasoning is that the Riccati differential equation (2.64) may have a finite escape time, i.e., the solution may not exist on the whole interval . For example, if and then (2.64) becomes . Its solution , where the constant depends on the choice of the initial condition, blows up when is an odd integer multiple of . This means that will not exist on all of for any choice of if .

We see that a sufficient condition for optimality should involve, in addition to an inequality like holding pointwise along the curve, some ``global" considerations applied to the entire curve. In fact, this becomes intuitively clear if we observe that a concatenation of optimal curves is not necessarily optimal. For example, consider the two great-circle arcs on a sphere shown in Figure 2.13. Each arc minimizes the distance between its endpoints, but this statement is no longer true for their concatenation--even when compared with nearby curves. At the same time, the concatenated arc would still satisfy any pointwise condition fulfilled by the two pieces.

So, we need to ensure the existence of a solution for the differential equation (2.64) on the whole interval . This issue, which escaped Legendre's attention, was pointed out by Lagrange in 1797. However, it was only in 1837, after 50 years had passed since Legendre's investigation, that Jacobi closed the gap by providing a missing ingredient which we now describe. The first step is to reduce the quadratic first-order differential equation (2.64) to another differential equation, linear but of second order, by making the substitution

where is a new (unknown) function, twice differentiable and not equal to 0 anywhere. Rewriting (2.64) in terms of , we obtain

Multiplying both sides of this equation by (which is nonzero), dividing by (which is positive), and canceling terms, we can bring it to the form

This is the so-called

Since (2.67) is a second-order differential equation, the initial
data at
needed to uniquely
specify a solution consists of
and
. In addition, note that if
is a solution
of (2.67) then
is also a solution for every constant
. By adjusting
appropriately, we can thus assume with no loss
of generality that
(since we are not interested
in
being identically 0). Among such solutions, let us consider the one
that starts at 0, i.e., set
. A point
is said to be *conjugate*
to
if this solution
hits 0 again at
, i.e.,
(see Figure 2.14).
It is clear that conjugate points are completely determined by
and
,
which in turn depend, through (2.59),
only on the test curve
and the Lagrangian
in the
original variational problem.

Conjugate points have a number of interesting properties and interpretations, and their theory is outside the scope of this book. We do mention the following interesting fact, which involves a concept that we will see again later when proving the maximum principle. If we consider two neighboring extremals (solutions of the Euler-Lagrange equation) starting from the same point at , and if is a point conjugate to , then at the distance between these two extremals becomes small (an infinitesimal of higher order) relative to the distance between the two extremals as well as between their derivatives over . As their distance over approaches 0, the two extremals actually intersect at a point whose -coordinate approaches . The reason behind this phenomenon is that the Jacobi equation is, approximately, the differential equation satisfied by the difference between two neighboring extremals; the next exercise makes this statement precise.

We see from (2.68) that
, which is
the difference between the two extremals,
satisfies the Jacobi equation (2.67) modulo terms of higher order.
A linear differential equation that describes,
within terms of higher order, the propagation
of the difference between two nearby solutions
of a given differential equation is called the *variational equation*
(corresponding to the given differential equation). In
this sense, the Jacobi equation is the variational equation for the Euler-Lagrange equation.
This property
can be shown to imply the claims we made before the exercise.
Intuitively speaking,
a conjugate point is where different neighboring extremals
starting from the same point meet again (approximately). If we revisit
the example of shortest-distance curves on a sphere, we see that conjugate points
correspond to diametrically
opposite points: all extremals
(which are great-circle arcs)
with a given initial point intersect after completing half a circle.
We will encounter the concept of a variational equation again in Section 4.2.4.

Now, suppose that the interval contains no points conjugate to . Let us see how this may help us in our task of finding a solution of the Jacobi equation (2.67) that does not equal 0 anywhere on . The absence of conjugate points means, by definition, that the solution with the initial data and never returns to 0 on . This is not yet a desired solution because we cannot have . What we can do, however, is make very small but positive. Using the property of continuity with respect to initial conditions for solutions of differential equations, it is possible to show that such a solution will remain positive everywhere on .

In view of our earlier discussion, we conclude that the second variation
is positive definite (on the space of admissible perturbations)
if
for all
and there are no points conjugate to
on
.
We remark in passing that the absence of points conjugate to
on
is also a necessary condition for
to be positive definite, and if
is positive semidefinite then no interior point of
can be conjugate to
.
We are now ready to state the following **second-order sufficient condition
for optimality**: *An extremal
is a strict minimum if
for all
and the interval
contains no points conjugate to
.*

Note that we do not yet have a proof of this result. Referring to the second-order expansion (2.56), we know that under the conditions just listed (since is an extremal) and given by (2.58) is positive, but we still need to show that dominates the higher-order term which has the properties established in Exercise 2.12. Since on , we can pick a small enough such that for all . Consider the integral

Reducing further towards 0 if necessary, we can ensure that no points conjugate to on are introduced as we pass from to (thanks to continuity of solutions of the accessory equation with respect to parameter variations). This guarantees that the functional (2.69) is still positive definite, hence

for all admissible perturbations (not identically equal to 0).

In light of our earlier derivation of Legendre's condition, we know that the
term depending on
is in some sense
the dominant term in (2.60),
and the inequality (2.70) indicates that we are in good shape.
Formally, we can handle the other,
-dependent term in (2.60)
as follows.
Use the Cauchy-Schwarz inequality with respect to the
norm^{2.4} to write

From this, we have

Now, Exercise 2.12 tells us that the term in (2.56) takes the form (2.60) where for close enough to 0 both and are smaller than for all and all with . Combined with (2.58), (2.70), and (2.71) this implies for these values of (except of course ), proving that is a (strict) weak minimum.

The above sufficient condition is not as constructive and practical as the first-order and second-order necessary conditions, because to apply it one needs to study conjugate points. The simpler necessary conditions can be exploited first, to see if they help narrow down candidates for an optimal solution. It should be observed, though, that the existence of conjugate points can be ruled out if the interval is taken to be sufficiently small.

As for the multiple-degrees-of-freedom setting, let us make the simplifying assumption that is a symmetric matrix (i.e., for all ). Then it is not difficult to show, following steps similar to those that led us to (2.58), that the second variation is given by the formula

where and are symmetric matrices still defined by (2.59). In place of introduced at the beginning of this subsection we need to consider a symmetric matrix , and a suitable modification of our earlier square completion argument yields the Riccati matrix differential equation

(note that denotes the derivative of , not the transpose). This quadratic differential equation is reduced to the second-order linear matrix differential equation by the substitution , where is a matrix. Conjugate points are defined in terms of becoming singular. Generalizing the previous results by following this route is straightforward. Riccati matrix differential equations and their solutions play a central role in the linear quadratic regulator problem, which we will study in detail in Chapter 6.