1.2.1.2 Second-order conditions for optimality

Next: 1.2.1.3 Feasible directions, global Up: 1.2.1 Unconstrained optimization Previous: 1.2.1.1 First-order necessary condition Contents Index

1.2.1.2 Second-order conditions for optimality

We now derive another necessary condition and also a sufficient condition for optimality, under the stronger hypothesis that is a $\mathcal C^2$ function (twice continuously differentiable).

First, we assume as before that is a local minimum and derive a necessary condition. For an arbitrary fixed $d\in\mathbb{R}^n$ , let us consider a Taylor expansion of $g(\alpha)=f(x^*+\alpha d)$ again, but this time include second-order terms:

$\displaystyle g(\alpha)=g(0)+g'(0)\alpha+\frac12g''(0)\alpha^2+o(\alpha^2)$

(1.12)

where

$\displaystyle \lim_{\alpha\to 0}\frac{o(\alpha^2)}{\alpha^2}=0.$

(1.13)

By the first-order necessary condition,

must be 0 since it is given by (1.10). We claim that

$\displaystyle g''(0)\ge 0.$

(1.14)

Indeed, suppose that

. By (1.13), there exists an $\varepsilon >0$ such that

$\displaystyle \vert\alpha\vert<\varepsilon \quad\Rightarrow\quad \vert o(\alpha^2)\vert< \frac12\vert g''(0)\vert\alpha^2.$

For these values of $\alpha$ , (1.12) reduces to $g(\alpha)-g(0)<0$ , contradicting that fact that 0 is a minimum of

. Therefore, (1.14) must hold.

What does this result imply about the original function ? To see what is in terms of , we need to differentiate the formula (1.9). The reader may find it helpful to first rewrite (1.9) more explicitly as

$\displaystyle g'(\alpha)=\sum_{i=1}^n {f}_{x_i}(x^*+\alpha d) d_i.$

Differentiating both sides with respect to $\alpha$ , we have

$\displaystyle g''(\alpha)=\sum_{i,j=1}^n {f}_{{x_i}{x_j}}(x^*+\alpha d) d_i d_j$

where double subscripts are used to denote second-order partial derivatives. For $\alpha=0$ this gives

$\displaystyle g''(0)=\sum_{i,j=1}^n {f}_{{x_i}{x_j}}(x^*) d_i d_j$

or, in matrix notation,

$\displaystyle g''(0)=d^T\nabla^2 f(x^*)d$

(1.15)

where

$\displaystyle \nabla^2 f:= \begin{pmatrix} {f}_{{x_1}{x_1}} & \dots& {f}_{{x... ...n}} \\ &\ddots& \\ {f}_{{x_n}{x_1}} & \dots &{f}_{{x_n}{x_n}} \end{pmatrix}$

is the Hessian matrix of

. In view of (1.14), (1.15), and the fact that

was arbitrary, we conclude that the matrix $\nabla^2 f(x^*)$ must be positive semidefinite:

$\displaystyle \fbox{$\nabla^2 f(x^*)\ge 0$} \$ (positive semidefinite)

This is the second-order necessary condition for optimality.

Like the previous first-order necessary condition, this second-order condition only applies to the unconstrained case. But, unlike the first-order condition, it requires to be $\mathcal C^2$ and not just $\mathcal C^1$ . Another difference with the first-order condition is that the second-order condition distinguishes minima from maxima: at a local maximum, the Hessian must be negative semidefinite, while the first-order condition applies to any extremum (a minimum or a maximum).

Strengthening the second-order necessary condition and combining it with the first-order necessary condition, we can obtain the following second-order sufficient condition for optimality: If a $\mathcal C^2$ function satisfies

$\displaystyle \nabla f(x^*)=0$ and $\displaystyle \qquad \nabla^2 f(x^*)>0 \$ (positive definite)

(1.16)

on an interior point of its domain, then is a strict local minimum of . To see why this is true, take an arbitrary $d\in\mathbb{R}^n$ and consider again the second-order expansion (1.12) for $g(\alpha)=f(x^*+\alpha d)$ . We know that

is given by (1.10), thus it is 0 because $\nabla f(x^*)=0$ . Next,

is given by (1.15), and so we have

$\displaystyle f(x^*+\alpha d)=f(x^*)+\frac12 d^T\nabla^2 f(x^*)d\alpha^2 +o(\alpha^2).$

(1.17)

The intuition is that since the Hessian $\nabla^2 f(x^*)$ is a positive definite matrix, the second-order term dominates the higher-order term $o(\alpha^2)$ . To establish this fact rigorously, note that by the definition of $o(\alpha^2)$ we can pick an $\varepsilon >0$ small enough so that

$\displaystyle \vert\alpha\vert<\varepsilon ,\ \alpha\ne 0 \quad\Rightarrow\quad \vert o(\alpha^2)\vert<\frac12 d^T\nabla^2 f(x^*)d\alpha^2$

and for these values of $\alpha$ we deduce from (1.17) that $f(x^*+\alpha d)>f(x^*).$

To conclude that is a (strict) local minimum, one more technical detail is needed. According to the definition of a local minimum (see page ), we must show that is the lowest value of in some ball around . But the term $o(\alpha^2)$ and hence the value of $\varepsilon$ in the above construction depend on the choice of the direction . It is clear that this dependence is continuous, since all the other terms in (1.17) are continuous in .^1.2Also, without loss of generality we can restrict to be of unit length, and then we can take the minimum of $\varepsilon$ over all such . Since the unit sphere in $\mathbb{R}^n$ is compact, the minimum is well defined (thanks to the Weierstrass Theorem which is discussed below). This minimal value of $\varepsilon$ provides the radius of the desired ball around in which the lowest value of is achieved at .

Next: 1.2.1.3 Feasible directions, global Up: 1.2.1 Unconstrained optimization Previous: 1.2.1.1 First-order necessary condition Contents Index

Daniel 2010-12-20