next up previous contents index
Next: 3.2 Calculus of variations Up: 3.1 Necessary conditions for Previous: 3.1.1 Weierstrass-Erdmann corner conditions   Contents   Index


3.1.2 Weierstrass excess function

To continue our search for additional conditions (besides being an extremal) which are necessary for a piecewise $ \mathcal C^1$ curve $ y$ to be a strong minimum, we now introduce a new concept. For a given Lagrangian $ L=L(x,y,z)$ , the Weierstrass excess function, or $ E$ -function, is defined as

$\displaystyle E(x,y,z,w):=L(x,y,w)-L(x,y,z)-(w-z)\cdot L_z(x,y,z).$ (3.6)

The above formula is written to suggest multiple degrees of freedom, but from now on we specialize to the single-degree-of-freedom case for simplicity. Note that $ L(x,y,z)+(w-z)L_z(x,y,z)$ is the first-order Taylor approximation of $ L(x,y,w)$ , viewed as a function of $ w$ , around $ w=z$ . This gives the geometric interpretation of the $ E$ -function as the distance between the Lagrangian and its linear approximation around $ w=z$ ; see Figure 3.3.

Figure: Weierstrass excess function
\includegraphics{figures/weierstrass.eps}

The Weierstrass necessary condition for a strong minimum states that if $ y(\cdot)$ is a strong minimum, then

$\displaystyle E(x,y(x),y'(x),w)\ge 0$ (3.7)

for all noncorner points $ x\in [a,b]$ and all $ w\in\mathbb{R}$ .
The geometric meaning of this condition is that for each $ x$ , the graph of the function $ L(x,y(x),\cdot)$ lies above its tangent line at $ y'(x)$ , which can be interpreted as a local convexity property of this function.

The Weierstrass necessary condition can be proved as follows. Suppose that a curve $ y$ is a strong minimum. Let $ \bar x\in[a,b]$ be a noncorner point of $ y$ , let $ d\in(\bar x,b]$ be such that the interval $ [\bar x,d]$ contains no corner points of $ y$ , and pick some $ w\in\mathbb{R}$ . We construct a family of perturbed curves $ y(\cdot ,\varepsilon )$ , parameterized by $ \varepsilon \in[0,d-\bar x)$ , which are continuous, coincide with $ y$ on the complement of $ [\bar x,d]$ , are linear with derivative $ w$ on $ [\bar x,\bar x+\varepsilon ]$ , and differ from $ y$ by a linear function on $ [\bar x+\varepsilon , d]$ . The precise definition is

$\displaystyle y(x,\varepsilon ):=\begin{cases}y(x) \quad&\text{ if }\ a\le x\le...
...+\varepsilon )\Big)\quad&\text{ if }\ \bar x+\varepsilon \le x\le d \end{cases}$ (3.8)

Such a perturbed curve $ y(\cdot ,\varepsilon )$ is shown in Figure 3.4.

Figure: The graphs of $ y$ and $ y(\cdot ,\varepsilon )$
\includegraphics{figures/mcshane.eps}

It is clear that

$\displaystyle y(\cdot,0)=y$ (3.9)

and that for $ \varepsilon $ close to 0 the perturbed curve $ y(\cdot ,\varepsilon )$ is close to the original curve $ y$ in the sense of the 0-norm. Therefore, the function $ \varepsilon \mapsto J(y(\cdot,\varepsilon ))$ must have a minimum at $ \varepsilon =0$ . We will now show that the right-sided derivative of this function at $ \varepsilon =0$ exists and equals $ E(\bar x,y(\bar x),y'(\bar x),w)$ . Then, since this derivative must be nonnegative and $ \bar x$ and $ w$ are arbitrary, the proof will be complete.

Noting that the behavior of $ y(\cdot ,\varepsilon )$ outside the interval $ [\bar x,d]$ does not depend on $ \varepsilon $ , we have

$\displaystyle {\frac d{d\varepsilon }} J(y(\cdot,\varepsilon ))={\frac d{d\vare...
...bar x+\varepsilon }^{d}L(x,y(x,\varepsilon ),{y}_{x}(x,\varepsilon ))dx\right).$ (3.10)

By (3.8), the first integral in (3.10) is $ \int_{\bar x}^{\bar x+\varepsilon }L(x,y(\bar x)+w(x-\bar x),w)dx$ and its derivative is simply

$\displaystyle L(\bar x+\varepsilon ,y(\bar x)+w\varepsilon ,w).$ (3.11)

The differentiation of the second integral in (3.10) gives

  $\displaystyle -L(\bar x+\varepsilon , y(\bar x+\varepsilon ,\varepsilon ),{y}_{...
...y(x,\varepsilon ),{y}_{x}(x,\varepsilon )){y}_{\varepsilon } (x,\varepsilon )dx$ (3.12)
  $\displaystyle +\int_{\bar x+\varepsilon }^{d}{L}_{y'}(x,y(x,\varepsilon ),{y}_{x}(x,\varepsilon )){y}_{{x}{\varepsilon }} (x,\varepsilon )dx$ (3.13)

(it is straightforward to check that the partial derivatives $ {y}_{x}(x,\varepsilon )$ , $ {y}_{\varepsilon } (x,\varepsilon )$ , $ {y}_{{x}{\varepsilon }} (x,\varepsilon )$ exist inside the relevant intervals). Since $ {y}_{{x}{\varepsilon }} (x,\varepsilon )={y}_{{\varepsilon }{x}}(x,\varepsilon )=\frac d{dx}{y}_{\varepsilon } (x,\varepsilon )$ , we can use integration by parts to bring the integral in (3.13) to the form

$\displaystyle \left.{L}_{y'}(x,y(x,\varepsilon ),{y}_{x}(x,\varepsilon )){y}_{\...
...arepsilon ),{y}_{x}(x,\varepsilon ))\Big){y}_{\varepsilon } (x,\varepsilon )dx.$ (3.14)

In more detail, the first term in (3.14) is

$\displaystyle {L}_{y'}(d,y(d,\varepsilon ),{y}_{x}(d,\varepsilon )){y}_{\vareps...
...arepsilon ,\varepsilon )){y}_{\varepsilon } (\bar x+\varepsilon ,\varepsilon ).$ (3.15)

We have from (3.8) that $ y(d,\varepsilon )=y(d)$ for all $ \varepsilon $ , hence $ {y}_{\varepsilon } (d,\varepsilon )=0$ and so the first term in (3.15) is 0. Another consequence of (3.8) is the relation $ y(\bar x+\varepsilon ,\varepsilon )=y(\bar x)+w\varepsilon $ . Differentiating it with respect to $ \varepsilon $ , we obtain $ {y}_{x}(\bar x+\varepsilon ,\varepsilon )+{y}_{\varepsilon } (\bar x+\varepsilon ,\varepsilon )=w$ , or $ {y}_{\varepsilon } (\bar x+\varepsilon ,\varepsilon )=w-{y}_{x}(\bar x+\varepsilon ,\varepsilon )$ . When we substitute this expression for $ {y}_{\varepsilon } (\bar x+\varepsilon ,\varepsilon )$ into the second term in (3.15), that term becomes

$\displaystyle -{L}_{y'}(\bar x+\varepsilon ,y(\bar x+\varepsilon ,\varepsilon )...
...bar x+\varepsilon ,\varepsilon ))(w-{y}_{x}(\bar x+\varepsilon ,\varepsilon )).$ (3.16)

We see that $ {\frac
d{d\varepsilon }} J(y(\cdot,\varepsilon ))$ is given by the sum of (3.11), (3.12), the integral (with the minus sign) in (3.14), and (3.16). Setting $ \varepsilon =0$ and rearranging terms, we arrive at

$\displaystyle \left.\frac d{d\varepsilon }\right\vert _{\varepsilon =0^+} \! J(y(\cdot,\varepsilon ))$ $\displaystyle =L(\bar x,y(\bar x),w)-L(\bar x,y(\bar x,0), {y}_{x}(\bar x,0))-{L}_{y'}(\bar x,y(\bar x,0), {y}_{x}(\bar x,0))(w-{y}_{x}(\bar x,0))$    
  $\displaystyle +\int_{\bar x}^d \Big({L}_{y}(x,y( x,0), {y}_{x}( x,0))-\frac d{dx}{L}_{y'}(x,y( x,0), {y}_{x}( x,0))\Big){y}_{\varepsilon } (x,0)dx.$    

Now, recall (3.9) which also implies that $ {y}_{x}(\cdot, 0)=y'$ . The integral on the second line of the previous formula thus equals

$\displaystyle \int_{\bar x}^d \Big({L}_{y}(x,y( x),
y'(x))-\frac d{dx}{L}_{y'}(x,y( x),
y'(x))\Big){y}_{\varepsilon } (x,0)dx=0
$

because $ y$ as a strong (hence also weak) minimum must satisfy the Euler-Lagrange equation. We are left with

$\displaystyle \left.\frac d{d\varepsilon }\right\vert _{\varepsilon =0^+} \! J(y(\cdot,\varepsilon ))$ $\displaystyle =L(\bar x,y(\bar x),w)-L(\bar x,y(\bar x), y'(\bar x))-{L}_{y'}(\bar x,y(\bar x), y'(\bar x))(w-y'(\bar x))$    
  $\displaystyle =E(\bar x,y(\bar x),y'(\bar x),w)$    

as desired, and the Weierstrass necessary condition is established. A necessary condition for a strong maximum is analogous but with the reversed inequality sign; this can be verified by passing from $ L$ to $ -L$ or by modifying the proof in the obvious way. The Weierstrass necessary condition can also be extended to corner points, either by refining the proof or via a limiting argument (cf. Exercise 3.3 below.)

Weierstrass introduced the above necessary condition during his 1879 lectures on calculus of variations. His original proof relied on an additional assumption (normality) which was subsequently removed by McShane in 1939. Let us now take a few moments to reflect on the perturbation used in the proof we just gave. First, it is important to observe that the perturbed curve $ y(\cdot ,\varepsilon )$ is close to the original curve $ y$ in the sense of the 0-norm, but not necessarily in the sense of the 1-norm. Indeed, it is clear that $ \Vert y(\cdot,\varepsilon )-y\Vert _0\to 0$ as $ \varepsilon \to 0$ ; on the other hand, the derivative of $ y(\cdot ,\varepsilon )$ for $ x$ immediately to the right of $ \bar x$ equals $ w$ , hence $ \Vert y(\cdot,\varepsilon )-y\Vert _1\ge \vert w-y'(\bar x)\vert$ for all $ \varepsilon >0$ , no matter how small. For this reason, the necessary condition applies only to strong minima, unless we restrict $ w$ to be sufficiently close to $ y'(\bar x)$ . Note also that the first variation was not used in the proof. Thus we have already departed significantly from the variational approach which we followed in Chapter 2. Derived using a richer class of perturbations, the Weierstrass necessary condition turns out to be powerful enough to yield as its corollaries the Weierstrass-Erdmann corner conditions from Section 3.1.1 as well as Legendre's condition from Section 2.6.1.


\begin{Exercise}
Use the Weierstrass necessary condition to
prove that a piecewi...
...onditions} and Legendre's condition.\index{Legendre's condition}
\end{Exercise}

When solving this exercise, the reader should keep the following points in mind. First, we know from Exercise 3.1 that the first Weierstrass-Erdmann corner condition is necessary for weak extrema as well. This condition should thus follow directly from the fact that $ y$ is an extremal--i.e., satisfies the integral form (2.23) of the Euler-Lagrange equation--without the need to apply the Weierstrass necessary condition. The second Weierstrass-Erdmann corner condition, on the other hand, is necessary only for strong extrema, and deducing it requires the full power of the Weierstrass necessary condition (including a further analysis of what the latter implies for corner points). As for Legendre's condition, it can be derived from the local version of the Weierstrass necessary condition with $ w$ restricted to be close to $ y'(x)$ for a given $ x$ , thus confirming that Legendre's condition is also necessary for weak extrema. Finally, when $ x$ is a corner point, Legendre's condition should read $ L_{y'y'}(x,y(x),y'(x^\pm))\ge 0.
$

The perturbation used in the above proof of the Weierstrass necessary condition is already quite close to the ones we will use later in the proof of the maximum principle. The main difference is that in the proof of the maximum principle, we will not insist on bringing the perturbed curve back to the original curve after the perturbation stops acting. Instead, we will analyze how the effect of a perturbation applied on a small interval propagates up to the terminal point.

There is a very insightful reformulation of the Weierstrass necessary condition which reveals its direct connection to the Hamiltonian maximization property discussed at the end of Section 2.6.1 (and thus to the maximum principle which we are steadily approaching). Let us write our Hamiltonian (2.63) as

$\displaystyle H(x,y,z,p)={z}p-L(x,y,z).
$

Then a simple manipulation of (3.6) allows us to bring the $ E$ -function and the condition (3.7) to the form

$\displaystyle E(x,y(x),y'(x),w)$ $\displaystyle = \big(y'(x)L_z(x,y(x),y'(x))-L(x,y(x),y'(x))\big)$    
  $\displaystyle -\big(wL_z(x,y(x),y'(x))-L(x,y(x),w)\big)$    
  $\displaystyle =H(x,y(x),y'(x),p(x))-H(x,y(x),w,p(x))\ge 0$    

where we used the formula

$\displaystyle p(x)=L_z(x,y(x),y'(x))
$

consistent with our earlier definition of the momentum (see Section 2.4.1). Therefore, the Weierstrass necessary condition simply says that if $ y(\cdot)$ is an optimal trajectory and $ p(\cdot)$ is the corresponding momentum, then for every $ x$ the function $ H(x,y(x),\cdot,p(x))$ , which is the same as the function $ H^*$ defined in (2.32), has a maximum at $ y'(x)$ . This interpretation escaped Weierstrass, not just because it requires bringing in the Hamiltonian but because it demands treating $ z$ and $ p$ as independent arguments of the Hamiltonian (we already discussed this point at the end of Section 2.4.2).

Combining the Weierstrass necessary condition with the sufficient condition for a weak minimum from Section 2.6.2, one can obtain a sufficient condition for a strong minimum. The precise formulation of this condition requires a new concept (that of a field) and we will not develop it. While sufficient conditions for optimality are theoretically appealing, they tend to be less practical to apply compared to necessary conditions; we already saw this in Section 2.6.2 and will see again in Chapter 5.


next up previous contents index
Next: 3.2 Calculus of variations Up: 3.1 Necessary conditions for Previous: 3.1.1 Weierstrass-Erdmann corner conditions   Contents   Index
Daniel 2010-12-20