Next: 1.2.2.2 Second-order conditions Up: 1.2.2 Constrained optimization Previous: 1.2.2 Constrained optimization   Contents   Index

### 1.2.2.1 First-order necessary condition (Lagrange multipliers)

Let be a local minimum of over . We assume that is a regular point of in the sense that the gradients , are linearly independent at . This is a technical assumption needed to rule out degenerate situations; see Exercise 1.2 below.

Instead of line segments containing which we used in the unconstrained case, we now consider curves in passing through . Such a curve is a family of points parameterized by , with . We require the function to be , at least for near 0. Given an arbitrary curve of this kind, we can consider the function

Note that when there are no equality constraints, functions of the form (1.4) considered previously can be viewed as special cases of this more general construction. From the fact that 0 is a minimum of , we derive exactly as before that (1.7) holds, i.e., . To interpret this result in terms of , note that

which for gives

 (1.19)

The vector is an important object for us here. From the first-order Taylor expansion we see that defines a linear approximation of at . Geometrically, it specifies the infinitesimal direction of the curve (see Figure 1.4). The vector is a tangent vector to at . It lives in the tangent space to at , which is denoted by . (We can think of this space as having its origin at .)

We want to have a more explicit characterization of the tangent space , which will help us understand it better. Since was defined as the set of points satisfying the equalities (1.18), and since the points lie in by construction, we must have for all and all . Differentiating this formula gives

for all (close enough to 0). Setting and remembering that , we obtain

We have shown that for an arbitrary curve in with , its tangent vector must satisfy for each . Actually, one can show that the converse is also true, namely, every vector satisfying

 (1.20)

is a tangent vector to at corresponding to some curve. (We do not give a proof of this fact but note that it relies on being a regular point of .) In other words, the tangent vectors to at are exactly the vectors for which (1.20) holds. This is the characterization of the tangent space that we were looking for. It is clear from (1.20) that is a subspace of ; in particular, if is a tangent vector, then so is (going from to corresponds to reversing the direction of the curve).

Now let us go back to (1.19), which says that for all (since the curve and thus the tangent vector were arbitrary). In view of the characterization of given by (1.20), we can rewrite this condition as follows:

 such that (1.21)

The relation between and expressed by (1.21) looks somewhat clumsy, since checking it involves a search over . Can we eliminate from this relation and make it more explicit? A careful look at (1.21) should quickly lead the reader to the following statement.

Claim: The gradient of at is a linear combination of the gradients of the constraint functions at :

 span (1.22)

Indeed, if the claim is not true, then has a component orthogonal to span , i.e, there exists a such that

 (1.23)

and can be written in the form

 (1.24)

for some . Taking the inner product with on both sides of (1.24) and using (1.23) gives

and we reach a contradiction with (1.21).

Geometrically, the claim says that is normal to at . This situation is illustrated in Figure 1.5 for the case of two constraints in . Note that if there is only one constraint, say , then is a two-dimensional surface and must be proportional to , the normal direction to at . When the second constraint is added, becomes a curve (the thick curve in the figure) and is allowed to live in the plane spanned by and , i.e., the normal plane to at . In general, the intuition behind the claim is that unless is normal to , there are curves in passing through whose tangent vectors at make both positive and negative inner products with , hence in particular can be decreased by moving away from while staying in .

The condition (1.22) means that there exist real numbers such that

 (1.25)

This is the first-order necessary condition for constrained optimality. The coefficients , are called Lagrange multipliers.

The above proof of the first-order necessary condition for constrained optimality involves geometric concepts. We also left a gap in it because we did not prove the converse implication in the equivalent characterization of the tangent space given by (1.20). We now give a shorter alternative proof which is purely analytic, and which will be useful when we study problems with constraints in calculus of variations. However, the geometric intuition behind the previous proof will be helpful for us later as well. We invite the reader to study both proofs as a way of testing the mathematical background knowledge that will be required in the subsequent chapters.

Let us start again by assuming that is a local minimum of over , where is a surface in defined by the equality constraints (1.18) and is a regular point of . Our goal is to rederive the necessary condition expressed by (1.25). For simplicity, we only give the argument for the case of a single constraint , i.e., ; the extension to is straightforward (see Exercise 1.3 below). Given two arbitrary vectors , we can consider the following map from to itself:

The Jacobian matrix of at is

 (1.26)

If this Jacobian matrix were nonsingular, then we could apply the Inverse Function Theorem (see, e.g., [Rud76, Theorem 9.24]) and conclude that there are neighborhoods of and on which the map is a bijection (has an inverse). This would imply, in particular, that there are points arbitrarily close to such that and ; such points would be obtained by taking preimages of points on the ray directed to the left from in Figure 1.6. But this cannot be true, since means that and we know that is a local minimum of over . Therefore, the matrix (1.26) is singular.

Regularity of in the present case just means that the gradient is nonzero. Choose a such that . With this fixed, let , so that . Since the matrix (1.26) must be singular for all choices of , its first row must be a constant multiple of its second row (the second row being nonzero by our choice of ). Thus we have , or , and this must be true for all It follows that , which proves (1.25) for the case when .

The first-order necessary condition for constrained optimality generalizes the corresponding result we derived earlier for the unconstrained case. The condition (1.25) together with the constraints (1.18) is a system of equations in unknowns: components of plus components of the Lagrange multiplier vector . For , we recover the condition (1.11) which consists of equations in unknowns. To make this relation even more explicit, consider the function defined by

 (1.27)

which we call the augmented cost function. If is a local constrained minimum of and is the corresponding vector of Lagrange multipliers for which (1.25) holds, then the gradient of at satisfies

 (1.28)

where , are the vectors of partial derivatives of with respect to the components of and , respectively, and is the vector of constraint functions. We conclude that is a usual (unconstrained) stationary point of the augmented cost . In other words, adding Lagrange multipliers basically converts a constrained problem into an unconstrained one, and the first-order necessary condition (1.25) for constrained optimality is recovered from the first-order necessary condition for unconstrained optimality applied to .

The idea of passing from constrained minimization of the original cost function to unconstrained minimization of the augmented cost function is due to Lagrange. If is a minimum of , then we must have (because otherwise we could decrease by changing ), and subject to these constraints should minimize (because otherwise it would not minimize ). Also, it is clear that (1.28) must hold. However, it does not follow that (1.28) is a necessary condition for to be a constrained minimum of . Unfortunately, there is no quick way to derive the first-order necessary condition for constrained optimality by working with the augmented cost--something that Lagrange originally attempted to do. Nevertheless, the basic form of the augmented cost function (1.27) is fundamental in constrained optimization theory, and will reappear in various forms several times in this book.

Even though the condition in terms of Lagrange multipliers is only necessary and not sufficient for constrained optimality, it is very useful for narrowing down candidates for local extrema. The next exercise illustrates this point for a well-known optimization problem arising in optics.

Next: 1.2.2.2 Second-order conditions Up: 1.2.2 Constrained optimization Previous: 1.2.2 Constrained optimization   Contents   Index
Daniel 2010-12-20