next up previous contents index
Next: 5.2.1 Example: nondifferentiable value Up: 5. The Hamilton-Jacobi-Bellman equation Previous: 5.1.5 Historical remarks   Contents   Index


5.2 HJB equation versus the maximum principle

Here we focus on the necessary conditions for optimality provided by the HJB equation (5.10) and the Hamiltonian maximization condition (5.14) on one hand and by the maximum principle on the other hand. There is a notable difference in how these two necessary conditions characterize optimal controls. In order to see this point more clearly, assume that the system and the cost are time-invariant. The maximum principle is formulated in terms of the canonical equations

$\displaystyle \dot x^*=\left.{H}_{p}\right\vert _{*},\qquad \dot p^*=\left.-{H}_{x}\right\vert _{*}$ (5.21)

and says that at each time $ t$ , the value $ u^*(t)$ of the optimal control must maximize $ H(x^*(t),u,p^*(t))$ with respect to $ u$ :

$\displaystyle u^*(t)=\arg\max_{u\in U}H(x^*(t),u,p^*(t)).$ (5.22)

This is an open-loop specification, because $ u^*(t)$ depends not only on the state $ x^*(t)$ but also on the costate $ p^*(t)$ which has to be computed from the adjoint differential equation. Now, in the context of the HJB equation, the optimal control must satisfy

$\displaystyle u^*(t)=\arg\max_{u\in U}H(x^*(t),u,-{V}_{x}(t,x^*(t))).$ (5.23)

This is a closed-loop (feedback) specification; indeed, assuming that we know the value function $ V$ everywhere, $ u^*(t)$ is completely determined by the current state $ x^*(t)$ . The ability to generate an optimal control policy in the form of a state feedback law is an important feature of the dynamic programming approach, as we in fact already knew from Section 5.1.1. Clearly, we cannot implement this feedback law unless we can first find the value function by solving the HJB partial differential equation, and we have seen that this is in general a very difficult task. Therefore, from the computational point of view the maximum principle has an advantage in that it involves only ordinary and not partial differential equations. In principle, the dynamic programming approach provides more information (including sufficiency), but in reality, the maximum principle is often easier to use and allows one to solve many optimal control problems for which the HJB equation is intractable.

As another point of comparison, it is interesting to recall how much longer and more complicated our proof of the maximum principle was compared with our derivation of the necessary conditions based on the HJB equation. This difference is especially perplexing in view of the striking similarity between the two Hamiltonian maximization conditions (5.26) and (5.27). We may wonder whether it might actually be possible to give an easier proof of the maximum principle starting from the HJB equation. Suppose that $ u^*$ is an optimal control and $ x^*$ is the corresponding state trajectory. Still assuming for simplicity that $ f$ and $ L$ are time-independent, we know that (5.27) must hold, where $ V$ is the value function satisfying

$\displaystyle -{V}_{t}(t,x^*(t))=L(x^*(t),u^*(t))+\left\langle {V}_{x}(t,x^*(t)),f(x^*(t),u^*(t))\right\rangle .$ (5.24)

To establish the maximum principle, we need to prove the existence of a costate $ p^*$ with the required properties. The formulas (5.26) and (5.27) strongly suggest that we should try to define it via

$\displaystyle p^*(t):=-{V}_{x}(t,x^*(t)).$ (5.25)

Then, the desired Hamiltonian maximization condition (5.26) automatically follows from (5.27). We note also that if $ V$ satisfies the boundary condition $ V(t_1,x)=K(x)$ as in (5.3), then the boundary condition for the costate (5.29) is $ p^*(t_1)=-{K}_{x}(x^*(t_1))$ , and this matches the boundary condition (4.43) that we had in the maximum principle for problems with terminal cost. Thus far, the situation looks quite promising, but we do not have any apparent reason to expect that $ p^*$ defined by (5.29) will satisfy the second differential equation in (5.25). However, this turns out to be true as well!


\begin{Exercise}
Let $p^*$\ be defined by~\eqref{e-costate-trydef}, with $V$\ fu...
...p^*)$, where as usual $H(x,u,p)=\langle p,f(x,u)\rangle -L(x,u)$.
\end{Exercise}

In the proof of the maximum principle, the adjoint vector $ p^*$ was defined as the normal to a suitable hyperplane. In our earlier discussions in Section 3.4, it was also related to the momentum and to the vector of Lagrange multipliers. From (5.29) we now have another interpretation of the adjoint vector in terms of the gradient of the value function, i.e., the sensitivity of the optimal cost with respect to the state $ x$ . In economic terms, this quantity corresponds to the ``marginal value," or ``shadow price"; it tells us by how much we can increase benefits by increasing resources/spending, or how much we would be willing to pay someone else for resources and still make a profit.

At this point, the reader may be puzzled as to why we cannot indeed deduce the maximum principle from the HJB equation via the reasoning just given. Upon careful inspection, however, we can identify one gap in the above argument: it assumes that the value function has a well-defined gradient and, moreover, that this gradient can be further differentiated with respect to time (to obtain the adjoint equation as in Exercise 5.5). In other words, we need the existence of second-order partial derivatives of $ V$ . At the very least, we need $ V$ to be a $ \mathcal C^1$ function--a property that we have in fact assumed all along, starting with the Taylor expansion (5.7). The next example demonstrates that, unfortunately, we cannot expect this to be true in general.



Subsections
next up previous contents index
Next: 5.2.1 Example: nondifferentiable value Up: 5. The Hamilton-Jacobi-Bellman equation Previous: 5.1.5 Historical remarks   Contents   Index
Daniel 2010-12-20