1.2.1.1 First-order necessary condition for optimality

Next: 1.2.1.2 Second-order conditions for Up: 1.2.1 Unconstrained optimization Previous: 1.2.1 Unconstrained optimization Contents Index

1.2.1.1 First-order necessary condition for optimality

Suppose that is a $\mathcal C^1$ (continuously differentiable) function and is its local minimum. Pick an arbitrary vector $d\in\mathbb{R}^n$ . Since we are in the unconstrained case, moving away from in the direction of or cannot immediately take us outside . In other words, we have $x^*+\alpha d\in D$ for all $\alpha\in\mathbb{R}$ close enough to 0.

For a fixed , we can consider $f(x^*+\alpha d)$ as a function of the real parameter $\alpha$ , whose domain is some interval containing 0. Let us call this new function :

$\displaystyle g(\alpha):=f(x^*+\alpha d).$

(1.4)

Since

is a minimum of

, it is clear that 0 is a minimum of

. Passing from

is useful because

is a function of a scalar variable and so its minima can be studied using ordinary calculus. In particular, we can write down the first-order Taylor expansion for

around $\alpha=0$ :

$\displaystyle g(\alpha)=g(0)+g'(0)\alpha+o(\alpha)$

(1.5)

where $o(\alpha)$ represents ``higher-order terms" which go to 0 faster than $\alpha$ as $\alpha$ approaches 0, i.e.,

$\displaystyle \lim_{\alpha\to 0}\frac{o(\alpha)}{\alpha}=0.$

(1.6)

We claim that

$\displaystyle g'(0)= 0.$

(1.7)

To show this, suppose that $g'(0)\ne 0$ . Then, in view of (1.6), there exists an $\varepsilon >0$ small enough so that for $\vert\alpha\vert<\varepsilon$ , the absolute value of the fraction in (1.6) is less than $\vert g'(0)\vert$ . We can write this as

$\displaystyle \vert\alpha\vert<\varepsilon \quad\Rightarrow\quad \vert o(\alpha)\vert< \vert g'(0)\alpha\vert.$

For these values of $\alpha$ , (1.5) gives

$\displaystyle g(\alpha)-g(0)<g'(0)\alpha+\vert g'(0)\alpha\vert.$

(1.8)

If we further restrict $\alpha$ to have the opposite sign to

, then the right-hand side of (1.8) becomes 0 and we obtain $g(\alpha)-g(0)<0$ . But this contradicts the fact that

has a minimum at 0. We have thus shown that (1.7) is indeed true.

We now need to re-express this result in terms of the original function . A simple application of the chain rule from vector calculus yields the formula

$\displaystyle g'(\alpha)=\nabla f(x^*+\alpha d)\cdot d$

(1.9)

where

$\displaystyle \nabla f:=\left({f}_{x_1},\dots,{f}_{x_n}\right)^T$

is the gradient of

and $\cdot$ denotes inner product.^1.1 Whenever there is no danger of confusion, we use subscripts as a shorthand notation for partial derivatives: ${f}_{x_i}:={\partial f}/{\partial x_i}$ . Setting $\alpha=0$ in (1.9), we have

$\displaystyle g'(0)=\nabla f(x^*)\cdot d$

(1.10)

and this equals 0 by (1.7). Since

was arbitrary, we conclude that

$\displaystyle \fbox{$\nabla f(x^*)=0$}$

(1.11)

This is the first-order necessary condition for optimality.

A point satisfying this condition is called a stationary point. The condition is ``first-order" because it is derived using the first-order expansion (1.5). We emphasize that the result is valid when $f\in\mathcal C^1$ and is an interior point of .

Next: 1.2.1.2 Second-order conditions for Up: 1.2.1 Unconstrained optimization Previous: 1.2.1 Unconstrained optimization Contents Index

Daniel 2010-12-20