1.3.2 First variation and first-order necessary condition

Next: 1.3.3 Second variation and Up: 1.3 Preview of infinite-dimensional Previous: 1.3.1 Function spaces, norms, Contents Index

1.3.2 First variation and first-order necessary condition

To develop the first-order necessary condition for optimality, we need a notion of derivative for functionals. Let $J: V\to \mathbb{R}$ be a functional on a function space , and consider some function $y\in V$ . The derivative of at , which will now be called the first variation, will also be a functional on , and in fact this functional will be linear. To define it, we consider functions in of the form $y+\alpha\eta$ , where $\eta\in V$ and $\alpha$ is a real parameter (which can be restricted to some interval around 0). The reader will recognize these functions as infinite-dimensional analogs of the points $x^*+\alpha d$ around a given point $x^*\in\mathbb{R}^n$ , which we utilized earlier.

A linear functional $\left.\delta J\right\vert _{y}:V\to \mathbb{R}$ is called the first variation of at if for all $\eta$ and all $\alpha$ we have

$\displaystyle J(y+\alpha\eta )=J(y)+\left.\delta J\right\vert _{y} (\eta)\alpha+o(\alpha)$

(1.33)

where $o(\alpha)$ satisfies (1.6). The somewhat cumbersome notation $\left.\delta J\right\vert _{y}(\eta)$ is meant to emphasize that the linear term in $\alpha$ in the expansion (1.33) depends on both

and $\eta$ . The requirement that $\left.\delta J\right\vert _{y}$ must be a linear functional is understood in the usual sense: $\left.\delta J\right\vert _{y}(\alpha_1\eta_1+\alpha_2\eta_2)= \alpha_1\left.\delta J\right\vert _{y}(\eta_1)+\alpha_2\left.\delta J\right\vert _{y}(\eta_2)$ for all $\eta_1,\eta_2\in V$ and $\alpha_1,\alpha_2\in\mathbb{R}$ .

The first variation as defined above corresponds to the Gateaux derivative of , which is just the usual derivative of $J(y+\alpha \eta)$ with respect to $\alpha$ (for fixed and $\eta$ ) evaluated at $\alpha=0$ :

$\displaystyle \left.\delta J\right\vert _{y}(\eta)=\lim_{\alpha\to 0} \frac{J(y+\alpha \eta)-J(y)}{\alpha}.$

(1.34)

In other words, if we define

$\displaystyle g(\alpha):=J(y+\alpha \eta)$

(1.35)

then

$\displaystyle \left.\delta J\right\vert _{y}(\eta)=g'(0)$

(1.36)

and (1.33) reduces exactly to our earlier first-order expansion (1.5).

Now, suppose that is a local minimum of over some subset of . We call a perturbation^1.3 $\eta\in V$ admissible (with respect to the subset ) if $y^*+\alpha\eta\in A$ for all $\alpha$ sufficiently close to 0. It follows from our definitions of a local minimum and an admissible perturbation that $J(y^*+\alpha\eta)$ as a function of $\alpha$ has a local minimum at $\alpha=0$ for each admissible $\eta$ . Let us assume that the first variation $\left.\delta J\right\vert _{y*}$ exists (which is of course not always the case) so that we have (1.33). Applying the same reasoning that we used to derive the necessary condition (1.7) on the basis of (1.5), we quickly arrive at the first-order necessary condition for optimality: For all admissible perturbations $\eta$ , we must have

$\displaystyle \fbox{$\left.\delta J\right\vert _{y^*}(\eta)=0$}$

(1.37)

As in the finite-dimensional case, the first-order necessary condition applies to both minima and maxima.

When we were studying a minimum of $f:\mathbb{R}^n\to\mathbb{R}$ with the help of the function $g(\alpha):=f(x^*+\alpha d)$ , it was easy to translate the equality via the formula (1.10) into the necessary condition $\nabla f(x^*)=0$ . The necessary condition (1.37), while conceptually very similar, is much less constructive. To be able to apply it, we need to learn how to compute the first variation of some useful functionals. This subject will be further discussed in the next chapter; for now, we offer an example for the reader to work out.

$\begin{Exercise} Consider the space $V=\mathcal C^0([0,1],\mathbb{R})$, let $g:... ...\left.\delta J\right\vert _{y}(\eta)= \int_0^1g'(y(x))\eta(x)dx$. \end{Exercise}$

Observe that our notion of the first variation, defined via the expansion (1.33), is independent of the choice of the norm on . This means that the first-order necessary condition (1.37) is valid for every norm. To obtain a necessary condition better tailored to a particular norm, we could define $\left.\delta J\right\vert _{y}$ differently, by using the following expansion instead of (1.33):

$\displaystyle J(y+\eta )=J(y)+\left.\delta J\right\vert _{y} (\eta)+o(\Vert\eta\Vert)$

(1.38)

The difference with our original formulation is subtle but substantial. The earlier expansion (1.33) describes how the value of

changes with $\alpha$ for each fixed $\eta$ . In (1.38), the higher-order term is a function of $\Vert\eta\Vert$ and so the expansion captures the effect of all $\eta$ at once. We remark that the first variation defined via (1.38) corresponds to the Fréchet derivative of

, which is a stronger differentiability notion than the Gateaux derivative (1.34). Note also that we no longer need the parameter $\alpha$ in (1.38). In fact, (1.38) suggests constructing more general perturbations: instead of working with functions of the form $y+\alpha\eta$ , where $\eta$ is fixed and $\alpha$ is a scalar parameter, we can consider perturbed functions $y+\eta$ which can approach

in a more arbitrary manner as $\Vert\eta\Vert$ tends to 0 (multiplying $\eta$ by a vanishing parameter is just one possibility). This generalization is conceptually similar to that of passing from the lines $x^*+\alpha d$ used in Section 1.2.1 to the curves $x(\alpha)$ utilized in Section 1.2.2. We will start seeing perturbations of this kind in Chapter 3.

In what follows, we retain our original definition of the first variation in terms of (1.33). It is somewhat simpler to work with and is adequate for our needs (at least through Chapter 2). While the norm-dependent formulation could potentially provide sharper conditions for optimality, it takes more work to verify (1.38) for all $\eta$ compared to verifying (1.33) for a fixed $\eta$ . Besides, we will eventually abandon the analysis based on the first variation altogether in favor of more powerful tools. However, it is useful to be aware of the alternative formulation (1.38), and we will occasionally make some side remarks related to it. This issue will resurface in Chapter 3 where, although the alternative definition (1.38) of the first variation will not be specifically needed, we will use more general perturbations along the lines of the preceding discussion.

Next: 1.3.3 Second variation and Up: 1.3 Preview of infinite-dimensional Previous: 1.3.1 Function spaces, norms, Contents Index

Daniel 2010-12-20