next up previous contents index
Next: 5.1.3 HJB equation Up: 5.1 Dynamic programming and Previous: 5.1.1 Motivation: the discrete   Contents   Index

5.1.2 Principle of optimality

We now return to the continuous-time optimal control problem that we have been studying since Section 3.3, defined by the control system (3.18) and the Bolza cost functional (3.21). For concreteness, assume that we are dealing with a fixed-time, free-endpoint problem, i.e., the target set is $ S= \{t_1\}\times\mathbb{R}^n$ . (Handling other target sets requires some modifications, on which we briefly comment in what follows.) We can then write the cost functional as

$\displaystyle J(u)=\int_{t_0}^{t_1} L(t,x(t),u(t))dt+K(x(t_1)).$

As we already remarked in Section 3.3.2, a more accurate notation for this functional would be $ J(t_0,x_0,u)$ as it depends on the initial data.

The basic idea of dynamic programming is to consider, instead of the problem of minimizing $ J(t_0,x_0,u)$ for given $ t_0$ and $ x_0$ , the family of minimization problems associated with the cost functionals

$\displaystyle J(t,x,u)=\int_{t}^{t_1}L( s,x( s),u( s))d s+K(x(t_1))$ (5.1)

where $ t$ ranges over $ [t_0,t_1)$ and $ x$ ranges over $ \mathbb{R}^n$ ; here $ x(\cdot)$ on the right-hand side denotes the state trajectory corresponding to the control $ u(\cdot)$ and satisfying $ x(t)=x$ . (There is a slight abuse of notation here; the second argument $ x$ of $ J$ in (5.1) is a fixed point, and only the third argument $ u$ is a function of time.) In accordance with Bellman's roadmap, our goal is to derive a dynamic relationship among these problems, and ultimately to solve all of them.

To this end, let us introduce the value function

$\displaystyle V(t,x):=\inf_{u_{[t,t_1]}} J(t,x,u)$ (5.2)

where the notation $ u_{[t,t_1]}$ indicates that the control $ u$ is restricted to the interval $ [t,t_1]$ . Loosely speaking, we can think of $ V(t,x)$ as the optimal cost (cost-to-go) from $ (t,x)$ . It is important to note, however, that the existence of an optimal control--and hence of the optimal cost--is not actually assumed, which is why we work with an infimum rather than a minimum in (5.2). If an optimal control exists, then the infimum turns into a minimum and $ V$ coincides with the optimal cost-to-go. In general, the infimum need not be achieved, and might even equal $ -\infty$ for some $ (t,x)$ .

It is clear that the value function must satisfy the boundary condition

$\displaystyle V(t_1,x)=K(x)\qquad \forall\,x\in\mathbb{R}^n.$ (5.3)

In particular, if there is no terminal cost ($ K\equiv0$ ) then we have $ V(t_1,x)=0$ . The boundary condition (5.3) is of course a consequence of our specific problem formulation. If the problem involved a more general target set $ S\subset[t_0,\infty)\times\mathbb{R}^n$ , then the boundary condition would read $ V(t,x)=K(x)$ for $ (t,x)\in S$ .

The basic principle of dynamic programming for the present case is a continuous-time counterpart of the principle of optimality formulated in Section 5.1.1, already familiar to us from Chapter 4. Here we can state this property as follows, calling it again the principle of optimality: For every $ (t,x)\in[t_0,t_1)\times\mathbb{R}^n$ and every $ {\scriptstyle\Delta}t\in(0,t_1-t]$ , the value function $ V$ defined in (5.2) satisfies the relation

$\displaystyle V(t,x)=\inf_{u_{[t,t+\Delta t]}}\left\{\int_t^{t+{\scriptstyle\De...
...( s,x(s),u(s))d s+V(t+{\scriptstyle\Delta}t,x(t+{\scriptstyle\Delta}t))\right\}$ (5.4)

where $ x(\cdot)$ on the right-hand side is the state trajectory corresponding to the control $ u_{[t,t+{\scriptstyle\Delta}t]}$ and satisfying $ x(t)=x$ . The intuition behind this statement is that to search for an optimal control, we can search over a small time interval for a control that minimizes the cost over this interval plus the subsequent optimal cost-to-go. Thus the minimization problem on the interval $ [t,t_1]$ is split into two, one on $ [t,t+{\scriptstyle\Delta}t]$ and the other on $ [t+{\scriptstyle\Delta}t,t_1]$ ; see Figure 5.3.

Figure: Continuous time: principle of optimality

The above principle of optimality may seem obvious. However, it is important to justify it rigorously, especially since we are using an infimum and not assuming existence of optimal controls. We give ``one half" of the proof by verifying that

$\displaystyle V(t,x)\ge \overline V(t,x)$ (5.5)

where $ \overline V(t,x)$ denotes the right-hand side of (5.4):

$\displaystyle \overline V(t,x):=\inf_{u_{[t,t+\Delta t]}}\left\{\int_t^{t+{\scr...
...s,x(s),u(s))d s+V(t+{\scriptstyle\Delta}t,x(t+{\scriptstyle\Delta}t))\right\}.

By (5.2) and the definition of infimum, for every $ \varepsilon >0$ there exists a control $ u_\varepsilon $ on $ [t,t_1]$ such that

$\displaystyle V(t,x)+\varepsilon \ge J(t,x,u_\varepsilon ). \ $ (5.6)

Writing $ x_\varepsilon $ for the corresponding state trajectory, we have

$\displaystyle J(t,x,u_\varepsilon )$ $\displaystyle =\int_t^{t+{\scriptstyle\Delta}t} L(s,x_\varepsilon (s),u_\vareps...
...+{\scriptstyle\Delta} t,x_\varepsilon (t+{\scriptstyle\Delta}t),u_\varepsilon )$    
  $\displaystyle \ge\int_t^{t+{\scriptstyle\Delta}t}L(s,x_\varepsilon (s),u_\varep...
...criptstyle\Delta} t,x_\varepsilon (t+{\scriptstyle\Delta}t))\ge\overline V(t,x)$    

where the two inequalities follow directly from the definitions of $ V$ and $ \overline V$ , respectively. Since (5.6) holds with an arbitrary $ \varepsilon >0$ , the desired inequality (5.5) is established.

Complete the proof of the principle of optimality by showing the
reverse inequality $ V(t,x)\le \overline V(t,x) $.

next up previous contents index
Next: 5.1.3 HJB equation Up: 5.1 Dynamic programming and Previous: 5.1.1 Motivation: the discrete   Contents   Index
Daniel 2010-12-20