Let
be a local minimum of
over
. We assume
that
is a *regular point* of
in the sense that
the gradients
,
are linearly independent
at
. This is a technical assumption needed to rule out degenerate
situations; see Exercise 1.2 below.

Instead of line segments containing
which we used in the
unconstrained case, we now
consider *curves* in
passing through
. Such a curve is
a family of points
parameterized by
, with
.
We require the function
to be
, at least for
near 0.
Given an arbitrary curve of this kind, we can consider the function

Note that when there are no equality constraints, functions of the form (1.4) considered previously can be viewed as special cases of this more general construction. From the fact that 0 is a minimum of , we derive exactly as before that (1.7) holds, i.e., . To interpret this result in terms of , note that

which for gives

The vector is an important object for us here. From the first-order Taylor expansion we see that defines a linear approximation of at . Geometrically, it specifies the infinitesimal direction of the curve (see Figure 1.4). The vector is a

We want to have a more explicit characterization of the tangent space , which will help us understand it better. Since was defined as the set of points satisfying the equalities (1.18), and since the points lie in by construction, we must have for all and all . Differentiating this formula gives

for all (close enough to 0). Setting and remembering that , we obtain

We have shown that for an arbitrary curve in with , its tangent vector must satisfy for each . Actually, one can show that the converse is also true, namely, every vector satisfying

is a tangent vector to at corresponding to some curve. (We do not give a proof of this fact but note that it relies on being a regular point of .) In other words, the tangent vectors to at are exactly the vectors for which (1.20) holds. This is the characterization of the tangent space that we were looking for. It is clear from (1.20) that is a subspace of ; in particular, if is a tangent vector, then so is (going from to corresponds to reversing the direction of the curve).

Now let us go back to (1.19), which says that for all (since the curve and thus the tangent vector were arbitrary). In view of the characterization of given by (1.20), we can rewrite this condition as follows:

The relation between and expressed by (1.21) looks somewhat clumsy, since checking it involves a search over . Can we eliminate from this relation and make it more explicit? A careful look at (1.21) should quickly lead the reader to the following statement.

*Claim*: The gradient of
at
is a linear combination of the
gradients of the constraint functions
at
:

Indeed, if the claim is not true, then has a component orthogonal to span , i.e, there exists a such that

and can be written in the form

for some . Taking the inner product with on both sides of (1.24) and using (1.23) gives

and we reach a contradiction with (1.21).

Geometrically, the claim says that is normal to at . This situation is illustrated in Figure 1.5 for the case of two constraints in . Note that if there is only one constraint, say , then is a two-dimensional surface and must be proportional to , the normal direction to at . When the second constraint is added, becomes a curve (the thick curve in the figure) and is allowed to live in the plane spanned by and , i.e., the normal plane to at . In general, the intuition behind the claim is that unless is normal to , there are curves in passing through whose tangent vectors at make both positive and negative inner products with , hence in particular can be decreased by moving away from while staying in .

The condition (1.22) means that there exist real numbers such that

This is the

The above proof of the first-order necessary condition for constrained optimality involves geometric concepts. We also left a gap in it because we did not prove the converse implication in the equivalent characterization of the tangent space given by (1.20). We now give a shorter alternative proof which is purely analytic, and which will be useful when we study problems with constraints in calculus of variations. However, the geometric intuition behind the previous proof will be helpful for us later as well. We invite the reader to study both proofs as a way of testing the mathematical background knowledge that will be required in the subsequent chapters.

Let us start again by assuming that is a local minimum of over , where is a surface in defined by the equality constraints (1.18) and is a regular point of . Our goal is to rederive the necessary condition expressed by (1.25). For simplicity, we only give the argument for the case of a single constraint , i.e., ; the extension to is straightforward (see Exercise 1.3 below). Given two arbitrary vectors , we can consider the following map from to itself:

The Jacobian matrix of at is

If this Jacobian matrix were nonsingular, then we could apply the Inverse Function Theorem (see, e.g., [Rud76, Theorem 9.24]) and conclude that there are neighborhoods of and on which the map is a bijection (has an inverse). This would imply, in particular, that there are points arbitrarily close to such that and ; such points would be obtained by taking preimages of points on the ray directed to the left from in Figure 1.6. But this cannot be true, since means that and we know that is a local minimum of over . Therefore, the matrix (1.26) is singular.

Regularity of in the present case just means that the gradient is nonzero. Choose a such that . With this fixed, let , so that . Since the matrix (1.26) must be singular for all choices of , its first row must be a constant multiple of its second row (the second row being nonzero by our choice of ). Thus we have , or , and this must be true for all It follows that , which proves (1.25) for the case when .

The first-order necessary condition for constrained optimality generalizes the corresponding result we derived earlier for the unconstrained case. The condition (1.25) together with the constraints (1.18) is a system of equations in unknowns: components of plus components of the Lagrange multiplier vector . For , we recover the condition (1.11) which consists of equations in unknowns. To make this relation even more explicit, consider the function defined by

which we call the

where , are the vectors of partial derivatives of with respect to the components of and , respectively, and is the vector of constraint functions. We conclude that is a usual (unconstrained) stationary point of the augmented cost . In other words, adding Lagrange multipliers basically converts a constrained problem into an unconstrained one, and the first-order necessary condition (1.25) for constrained optimality is recovered from the first-order necessary condition for unconstrained optimality applied to .

The idea of passing from constrained minimization of the original cost function to
unconstrained minimization of the augmented cost function is due to Lagrange.
If
is a minimum of
, then
we must
have
(because otherwise we could decrease
by changing
), and subject to these constraints
should minimize
(because otherwise it would not minimize
). Also, it is clear that (1.28) must hold. However,
it does *not* follow that (1.28) is a necessary condition
for
to be a constrained minimum of
. Unfortunately, there is no quick way to
derive the first-order necessary condition for constrained optimality by
working with the augmented cost--something that Lagrange originally attempted to
do. Nevertheless, the basic form of the augmented cost function (1.27)
is fundamental in constrained optimization theory, and will
reappear in various forms several times in this book.

Even though the condition in terms of Lagrange multipliers is only necessary and not sufficient for constrained optimality, it is very useful for narrowing down candidates for local extrema. The next exercise illustrates this point for a well-known optimization problem arising in optics.