5.1.3 HJB equation

Next: 5.1.3.1 Infinite-horizon problem Up: 5.1 Dynamic programming and Previous: 5.1.2 Principle of optimality Contents Index

5.1.3 HJB equation

In the principle of optimality (5.4) the value function appears on both sides with different arguments. We can thus think of (5.4) as describing a dynamic relationship among the optimal values of the costs (5.1) for different and , which we declared earlier to be our goal. However, this relationship is rather clumsy and not very convenient to use in its present form. What we will now do is pass to its more compact infinitesimal version, which will take the form of a partial differential equation (PDE). The steps that follow rely on first-order Taylor expansions; the reader will recall that we used somewhat similar calculations when deriving the maximum principle. First, write $x(t+{\scriptstyle\Delta}t)$ appearing on the right-hand side of (5.4) as

$\displaystyle x(t+{\scriptstyle\Delta}t)=x+f(t,x,u(t)){\scriptstyle\Delta}t+o({\scriptstyle\Delta}t)$

where we remembered that

. This allows us to express $V(t+{\scriptstyle\Delta}t,x(t+{\scriptstyle\Delta}t))$ as

$\displaystyle V(t+{\scriptstyle\Delta}t,x(t+{\scriptstyle\Delta}t))= V(t,x)+{V}... ... {V}_{x}(t,x),f(t,x,u(t)){\scriptstyle\Delta}t\rangle +o({\scriptstyle\Delta}t)$

(5.7)

(for now we proceed under the assumption--whose validity we will examine later--that

is $\mathcal C^1$ .) We also have

$\displaystyle \int_t^{t+{\scriptstyle\Delta}t}L( s,x(s),u(s))d s= L(t,x,u(t)){\scriptstyle\Delta}t+o({\scriptstyle\Delta}t).$

(5.8)

Substituting the expressions given by (5.7) and (5.8) into the right-hand side of (5.4), we obtain

$\displaystyle V(t,x)=\inf_{u_{[t,t+\Delta t]}}\left\{L(t,x,u(t)){\scriptstyle\D... ...,x),f(t,x,u(t)){\scriptstyle\Delta}t\rangle +o({\scriptstyle\Delta}t)\right\}.$

The two

terms cancel out (because the one inside the infimum does not depend on

and can be pulled outside), which leaves us with

$\displaystyle 0=\inf_{u_{[t,t+\Delta t]}}\left\{L(t,x,u(t)){\scriptstyle\Delta}... ...t,x),f(t,x,u(t)){\scriptstyle\Delta}t\rangle +o({\scriptstyle\Delta}t)\right\}.$

(5.9)

Let us now divide by ${\scriptstyle\Delta}t$ and take it to be small. In the limit as ${\scriptstyle\Delta}t\to 0$ the higher-order term $o({\scriptstyle\Delta}t)/{\scriptstyle\Delta}t$ disappears, and the infimum is taken over the instantaneous value of

at time

(in fact, already in (5.9) the control values

for

affect the expression inside the infimum only through the $o({\scriptstyle\Delta}t)$ term). Pulling ${V}_{t}(t,x)$ outside the infimum as it does not depend on

, we conclude that the equation

$\displaystyle \fbox{$-{V}_{t}(t,x)=\inf\limits_{u\in U} \Big\{L(t,x,u)+\big\langle {V}_{x}(t,x),f(t,x,u)\big\rangle \Big\}$}$

(5.10)

must hold for all $t\in[t_0,t_1)$ and all $x\in\mathbb{R}^n$ . This equation for the value function is called the Hamilton-Jacobi-Bellman (HJB) equation. It is a PDE since it contains partial derivatives of

with respect to

and

. The accompanying boundary condition is (5.3).

Note that the terminal cost appears only in the boundary condition and not in the HJB equation itself. In fact, the specifics about the terminal cost and terminal time did not play a role in our derivation of the HJB equation. For different target sets, the boundary condition changes (as we already discussed) but the HJB equation remains the same. However, the HJB equation will not hold for $(t,x)\in S$ just like it does not hold at in the fixed-time case, because the principle of optimality is not valid there.

We can apply one more transformation in order to rewrite the HJB equation in a simpler--and also more insightful--way. It is easy to check that (5.10) is equivalent to

$\displaystyle {V}_{t}(t,x)=\sup_{u\in U} \left\{\left\langle -{V}_{x}(t,x),f(t,x,u)\right\rangle -L(t,x,u)\right\}.$

(5.11)

Let us now recall our earlier definition (3.29) of the Hamiltonian, reproduced here:

$\displaystyle H(t,x,u,p):=\langle p,f(t,x,u)\rangle -L(t,x,u).$

We see that the expression inside the supremum in (5.11) is nothing but the Hamiltonian, with $-{V}_{x}$ playing the role of the costate. This brings us to the Hamiltonian form of the HJB equation:

$\displaystyle {V}_{t}(t,x)=\displaystyle\sup_{u\in U} H\left(t,x,u,-{V}_{x}(t,x)\right).$

(5.12)

So far, the existence of an optimal control has not been assumed. When an optimal (in the global sense) control does exist, the infimum in the previous calculations can be replaced by a minimum and this minimum is achieved when is plugged in. In particular, the principle of optimality (5.4) yields

$\displaystyle V(t,x^*(t))$	$\displaystyle =\min_{u_{[t,t+\Delta t]}}\left\{\int_t^{t+{\scriptstyle\Delta}t} L( s,x(s),u(s))d s+V(t+{\scriptstyle\Delta}t,x(t+{\scriptstyle\Delta}t))\right\}$
	$\displaystyle =\int_t^{t+{\scriptstyle\Delta}t} L( s,x^(s),u^(s))d s+V(t+{\scriptstyle\Delta}t,x^*(t+{\scriptstyle\Delta}t))$

where $x^*(\cdot)$ and $x(\cdot)$ are trajectories corresponding to $u^*(\cdot)$ and $u(\cdot)$ , respectively, both passing through the same point

at time

. From this, repeating the same steps that led us earlier to the HJB equation (5.10), we obtain

$\begin{displaymath}\begin{split}-{V}_{t}(t,x^*(t))&=\min_{u\in U} \left\{L(t,x^*... ...{V}_{x}(t,x^*(t)),f(t,x^*(t),u^*(t))\right\rangle . \end{split}\end{displaymath}$

(5.13)

Expressed in terms of the Hamiltonian, the second equation in (5.13) becomes

$\displaystyle H(t,x^*(t),u^*(t),-{V}_{x}(t,x^*(t)))=\max_{u\in U} H(t,x^*(t),u,-{V}_{x}(t,x^*(t))).$

(5.14)

This Hamiltonian maximization condition is analogous to the one we had in the maximum principle. We see that if we can find a closed-form expression for the control that maximizes the Hamiltonian--or, equivalently, the control that achieves the infimum in the HJB equation (5.10)--then the HJB equation becomes simpler and more explicit.

$\begin{Example} Consider the standard integrator $\dot x=u$\ (with $x,u\in\mathb... ...dex{Hamilton-Jacobi-Bellman (HJB) equation!solving} difficult.~\qed\end{Example}$ ^5.1

$\begin{Example} % latex2html id marker 9578Consider again the minimal-time par... ...t invite the reader to play more with it in the next exercise.~\qed\end{Example}$

$\begin{Exercise} % latex2html id marker 9602Do you see a way to further simpli... ...lier results to obtain more information about the value function? \end{Exercise}$

Subsections

5.1.3.1 Infinite-horizon problem

Next: 5.1.3.1 Infinite-horizon problem Up: 5.1 Dynamic programming and Previous: 5.1.2 Principle of optimality Contents Index

Daniel 2010-12-20