Next: 5.1.3 HJB equation Up: 5.1 Dynamic programming and Previous: 5.1.1 Motivation: the discrete   Contents   Index

## 5.1.2 Principle of optimality

We now return to the continuous-time optimal control problem that we have been studying since Section 3.3, defined by the control system (3.18) and the Bolza cost functional (3.21). For concreteness, assume that we are dealing with a fixed-time, free-endpoint problem, i.e., the target set is . (Handling other target sets requires some modifications, on which we briefly comment in what follows.) We can then write the cost functional as

As we already remarked in Section 3.3.2, a more accurate notation for this functional would be as it depends on the initial data.

The basic idea of dynamic programming is to consider, instead of the problem of minimizing for given and , the family of minimization problems associated with the cost functionals

 (5.1)

where ranges over and ranges over ; here on the right-hand side denotes the state trajectory corresponding to the control and satisfying . (There is a slight abuse of notation here; the second argument of in (5.1) is a fixed point, and only the third argument is a function of time.) In accordance with Bellman's roadmap, our goal is to derive a dynamic relationship among these problems, and ultimately to solve all of them.

To this end, let us introduce the value function

 (5.2)

where the notation indicates that the control is restricted to the interval . Loosely speaking, we can think of as the optimal cost (cost-to-go) from . It is important to note, however, that the existence of an optimal control--and hence of the optimal cost--is not actually assumed, which is why we work with an infimum rather than a minimum in (5.2). If an optimal control exists, then the infimum turns into a minimum and coincides with the optimal cost-to-go. In general, the infimum need not be achieved, and might even equal for some .

It is clear that the value function must satisfy the boundary condition

 (5.3)

In particular, if there is no terminal cost ( ) then we have . The boundary condition (5.3) is of course a consequence of our specific problem formulation. If the problem involved a more general target set , then the boundary condition would read for .

The basic principle of dynamic programming for the present case is a continuous-time counterpart of the principle of optimality formulated in Section 5.1.1, already familiar to us from Chapter 4. Here we can state this property as follows, calling it again the principle of optimality: For every and every , the value function defined in (5.2) satisfies the relation

 (5.4)

where on the right-hand side is the state trajectory corresponding to the control and satisfying . The intuition behind this statement is that to search for an optimal control, we can search over a small time interval for a control that minimizes the cost over this interval plus the subsequent optimal cost-to-go. Thus the minimization problem on the interval is split into two, one on and the other on ; see Figure 5.3.

The above principle of optimality may seem obvious. However, it is important to justify it rigorously, especially since we are using an infimum and not assuming existence of optimal controls. We give one half" of the proof by verifying that

 (5.5)

where denotes the right-hand side of (5.4):

By (5.2) and the definition of infimum, for every there exists a control on such that

 (5.6)

Writing for the corresponding state trajectory, we have

where the two inequalities follow directly from the definitions of and , respectively. Since (5.6) holds with an arbitrary , the desired inequality (5.5) is established.

Next: 5.1.3 HJB equation Up: 5.1 Dynamic programming and Previous: 5.1.1 Motivation: the discrete   Contents   Index
Daniel 2010-12-20