To develop the first-order necessary condition for optimality, we need a notion of derivative for functionals. Let be a functional on a function space , and consider some function . The derivative of at , which will now be called the first variation, will also be a functional on , and in fact this functional will be linear. To define it, we consider functions in of the form , where and is a real parameter (which can be restricted to some interval around 0). The reader will recognize these functions as infinite-dimensional analogs of the points around a given point , which we utilized earlier.
A linear functional is called the first variation of at if for all and all we have
The first variation as defined above corresponds to the Gateaux derivative of , which is just the usual derivative of with respect to (for fixed and ) evaluated at :
Now, suppose that is a local minimum of over some subset of . We call a perturbation1.3 admissible (with respect to the subset ) if for all sufficiently close to 0. It follows from our definitions of a local minimum and an admissible perturbation that as a function of has a local minimum at for each admissible . Let us assume that the first variation exists (which is of course not always the case) so that we have (1.33). Applying the same reasoning that we used to derive the necessary condition (1.7) on the basis of (1.5), we quickly arrive at the first-order necessary condition for optimality: For all admissible perturbations , we must have
When we were studying a minimum of with the help of the function , it was easy to translate the equality via the formula (1.10) into the necessary condition . The necessary condition (1.37), while conceptually very similar, is much less constructive. To be able to apply it, we need to learn how to compute the first variation of some useful functionals. This subject will be further discussed in the next chapter; for now, we offer an example for the reader to work out.
Observe that our notion of the first variation, defined via the expansion (1.33), is independent of the choice of the norm on . This means that the first-order necessary condition (1.37) is valid for every norm. To obtain a necessary condition better tailored to a particular norm, we could define differently, by using the following expansion instead of (1.33):
In what follows, we retain our original definition of the first variation in terms of (1.33). It is somewhat simpler to work with and is adequate for our needs (at least through Chapter 2). While the norm-dependent formulation could potentially provide sharper conditions for optimality, it takes more work to verify (1.38) for all compared to verifying (1.33) for a fixed . Besides, we will eventually abandon the analysis based on the first variation altogether in favor of more powerful tools. However, it is useful to be aware of the alternative formulation (1.38), and we will occasionally make some side remarks related to it. This issue will resurface in Chapter 3 where, although the alternative definition (1.38) of the first variation will not be specifically needed, we will use more general perturbations along the lines of the preceding discussion.