next up previous contents index
Next: 2.6.2 Sufficient condition for Up: 2.6 Second-order conditions Previous: 2.6 Second-order conditions   Contents   Index


2.6.1 Legendre's necessary condition for a weak minimum

Let us compute $ \left.\delta^2 J\right\vert _{y}$ for a given test curve $ y$ . Since third-order partial derivatives of $ L$ will appear, we assume that $ L\in\mathcal C^3$ . We work with the single-degree-of-freedom case for now. The left-hand side of (2.56) is

$\displaystyle J(y+\alpha\eta)=\int_a^b L(x,y(x)+\alpha\eta(x),y'(x)+\alpha\eta'(x))dx.$

We need to write down its second-order Taylor expansion with respect to $ \alpha$ . We do this by expanding the function inside the integral with respect to $ \alpha$ (using the chain rule) and separating the terms of different orders in $ \alpha$ :

$\displaystyle J(y+\alpha \eta)$ $\displaystyle =\int_a^b L(x,y(x),y'(x))dx+\int_a^b \big({L}_{ y}(x,y(x),y'(x))\eta(x)+{L}_{y'}(x,y(x),y'(x))\eta'(x)\big)dx\cdot \alpha$    
  $\displaystyle +\frac12\int_a^b\big({L}_{{ y}{y}}(x,y(x),y'(x))(\eta(x))^2+2{L}_{{ y}{ y'}}(x,y(x),y'(x))\eta(x)\eta'(x)$ (2.55)
  $\displaystyle \qquad\ \ \,+ {L}_{{y'}{y'}}(x,y(x),y'(x))(\eta'(x))^2\big)dx\cdot \alpha^2 +o(\alpha^2).$    

Matching this expression with (2.56) term by term, we deduce that the second variation is given by

$\displaystyle \left.\delta^2 J\right\vert _{y}(\eta)=\frac12\int_a^b\big({L}_{{ y}{y}} \eta^2+2{L}_{{ y}{ y'}}\eta\eta'+ {L}_{{y'}{y'}}(\eta')^2\big)dx$    

where the integrand is evaluated along $ (x,y(x),y'(x))$ . This is indeed a quadratic form as defined in Section 1.3.3. Note that it explicitly depends on $ \eta'$ as well as on $ \eta $ . In contrast with the first variation (analyzed in detail in Section 2.3.1), the dependence of the second variation on $ \eta'$ is essential and cannot be eliminated. We can, however, simplify the expression for $ \left.\delta^2 J\right\vert _{y}(\eta)$ by eliminating the ``mixed" term containing the product $ \eta\eta'$ . We do this by using--as we did in our earlier derivation of the Euler-Lagrange equation--the method of integration by parts:

$\displaystyle \int_a^b2L_{yy'}\eta\eta'dx=\int_a^bL_{yy'}\frac d{dx}(\eta^2)dx
=\left.L_{yy'}\eta^2\right\vert _{a}^b-\int_a^b\frac d{dx}(L_{yy'})\eta^2dx.
$

The first, non-integral term on the right-hand side vanishes due to the boundary conditions (2.11). Therefore, the second variation can be written as

$\displaystyle \left.\delta^2 J\right\vert _{y}(\eta)= \int_a^b\left(P(x)(\eta'(x))^2+Q(x)(\eta(x))^2 \right)dx$ (2.56)

where

$\displaystyle P(x):=\frac12L_{y'y'}(x,y(x),y'(x)),\qquad Q(x):=\frac12\Big(L_{yy}(x,y(x),y'(x)) -\frac d{dx}L_{yy'}(x,y(x),y'(x))\Big).$ (2.57)

Note that $ P$ is continuous, and $ Q$ is also continuous at least when $ y\in
\mathcal C^2$ .

When we come to the issue of sufficiency in the next subsection, we will also need a more precise characterization of the higher-order term labeled as $ o(\alpha^2)$ in the expansion (2.57). The next exercise invites the reader to go back to the derivation of (2.57) and analyze this term in more detail.


\begin{Exercise}
% latex2html id marker 8578Use Taylor's theorem with remainde...
...he same statement with
respect to the 0-norm is in
general false.
\end{Exercise}

We know that if $ y$ is a minimum, then for all $ \mathcal C^1$ perturbations $ \eta $ vanishing at the endpoints the quantity (2.58) must be nonnegative:

$\displaystyle \int_a^b\left(P(x)(\eta'(x))^2+Q(x)(\eta(x))^2\right)dx\ge 0.$ (2.58)

We would like to restate this condition in terms of $ P$ and $ Q$ only--which are defined directly from $ L$ and $ y$ via (2.59)--so that we would not need to check it for all $ \eta $ . (Recall that we followed a similar route earlier when passing from the condition (2.16) to the Euler-Lagrange equation via Lemma 2.1.)

What, if anything, does the inequality (2.61) imply about $ P$ and $ Q$ ? Does it force at least one of these two functions, or perhaps both, to be nonnegative on $ [a,b]$ ? The two terms inside the integral in (2.61) are of course not independent because $ \eta'$ and $ \eta $ are related. So, we should try to see if maybe one of them dominates the other. More specifically, can it happen that $ \eta'$ is large (in magnitude) while $ \eta $ is small, or the other way around?

To answer these questions, consider a family of perturbations $ \eta _\varepsilon $ parameterized by small $ \varepsilon >0$ , depicted in Figure 2.12. The function $ \eta _\varepsilon $ equals 0 everywhere outside some interval $ [c,d]\subset[a,b]$ , and inside this interval it equals 1 except near the endpoints where it rapidly goes up to 1 and back down to 0. This rapid transfer is accomplished by the derivative $ \eta'_\varepsilon $ having a short pulse of width approximately $ \varepsilon $ and height approximately $ 1/\varepsilon $ right after $ c$ , and a similar negative pulse right before $ d$ . Here $ \varepsilon $ is small compared to $ d-c$ . We base the subsequent argument on this graphical description, but it is not difficult to specify a formula for $ \eta _\varepsilon $ and use it to verify the claims that follow; see, e.g., [GF63, p. 103] for a similar construction.

Figure: The graphs of $ \eta _\varepsilon $ and its derivative
\includegraphics{figures/pulse.eps}

We can see that

$\displaystyle \left\vert\int_a^b Q(x)(\eta_\varepsilon (x))^2dx\right\vert
\le\int_{c}^{d}\vert Q(x)\vert dx
$

and this bound is uniform over $ \varepsilon $ . On the other hand, for nonzero $ P$ the integral $ \int_a^b P(x)(\eta_\varepsilon '(x))^2dx
$ does not stay bounded as $ \varepsilon \to 0$ , because it is of order $ 1/\varepsilon $ . In particular, let us see this more clearly for the case when $ P$ is negative on $ [c,d]$ , so that for some $ \delta>0$ we have $ P(x)\le -\delta$ for all $ x\in[c,d]$ . Assume that there is an interval inside $ [c,c+\varepsilon ]$ of length at least $ \varepsilon /2$ on which $ \eta'_\varepsilon $ is no smaller than $ 1/(2\varepsilon )$ ; this property is completely consistent with our earlier description of $ \eta _\varepsilon $ and $ \eta'_\varepsilon $ . We then have

$\displaystyle \int_a^b P(x)(\eta_\varepsilon '(x))^2dx\le\int_c^{c+\varepsilon ...
...lta\frac{1}{4\varepsilon ^2}\frac\varepsilon 2=
-\frac{\delta}{8\varepsilon }.
$

As $ \varepsilon \to 0$ , the above expression tends to $ -\infty$ , dominating the bounded $ Q$ -dependent term. It follows that the inequality (2.61) cannot hold for all $ \eta $ if $ P$ is negative on some subinterval $ [c,d]$ . But this means that for $ y$ to be a minimum, $ P$ must be nonnegative everywhere on $ [a,b]$ . Indeed, if $ P(\bar x)<0$ for some $ \bar x\in[a,b]$ , then by continuity of $ P$ we can find a subinterval $ [c,d]$ containing $ \bar x$ on which $ P$ is negative, and the above construction can be applied.2.3

Recalling the definition of $ P$ in (2.59), we arrive at our second-order necessary condition for optimality: For all $ x\in [a,b]$ we must have

$\displaystyle \fbox{$L_{y'y'}(x,y(x),y'(x))\ge 0$}$ (2.59)

This condition is known as Legendre's condition, as it was obtained by Legendre in 1786. Note that it places no restrictions on the sign of $ Q$ ; intuitively speaking, the $ Q$ -dependent term in the second variation is dominated by the $ P$ -dependent term, hence $ Q$ can in principle be negative along the optimal curve (this point will become clearer in the next subsection). For multiple degrees of freedom, the proof takes a bit more work but the statement of Legendre's condition is virtually unchanged: $ L_{y'y'}(x,y(x),y'(x))$ , which becomes a symmetric matrix, must be positive semidefinite for all $ x$ , i.e., (2.62) must hold in the matrix sense along the optimal curve.

As a brief digression, let us recall our definition (2.29) of the Hamiltonian:

$\displaystyle H(x,y,y',p)=p\cdot{y'}-L(x,y,y')$ (2.60)

where $ p={L}_{y'}(x,y,y')$ . We already noted in Section 2.4.1 that when $ H$ is viewed as a function of $ y'$ with the other arguments evaluated along an optimal curve $ y$ , it should have a stationary point at $ y'=y'(x)$ , the velocity of $ y$ at $ x$ . More, precisely, the function $ H^*$ defined in (2.32) has a stationary point at $ z=y'(x)$ , which is a consequence of (2.33). Legendre's condition tells us that, in addition, $ {H}_{{y'}{y'}}=-{L}_{{y'}{y'}}\le 0$ along an optimal curve, which we can rewrite in terms of $ H^*$ as

$\displaystyle \frac{d^2 H^*}{dz^2}(y'(x))=-{L}_{{y'}{y'}}(x,y,y'(x))\le 0.
$

Thus, if the above stationary point is an extremum, then it is necessarily a maximum. This interpretation of necessary conditions for optimality moves us one step closer to the maximum principle. The basic idea behind our derivation of Legendre's condition will reappear in Section 3.4 in the context of optimal control, but eventually (in Chapter 4) we will obtain a stronger result using more advanced techniques.


next up previous contents index
Next: 2.6.2 Sufficient condition for Up: 2.6 Second-order conditions Previous: 2.6 Second-order conditions   Contents   Index
Daniel 2010-12-20