5.1.2 Principle of optimality

We now return to the continuous-time optimal control problem that we have been studying since Section 3.3, defined by the control system (3.18) and the Bolza cost functional (3.21). For concreteness, assume that we are dealing with a fixed-time, free-endpoint problem, i.e., the target set is . (Handling other target sets requires some modifications, on which we briefly comment in what follows.) We can then write the cost functional as

As we already remarked in Section 3.3.2, a more accurate notation for this functional would be as it depends on the initial data.

The basic idea of dynamic programming is to consider, instead of
the problem of minimizing
for given
and
, the *family* of minimization problems associated
with the cost functionals

where ranges over and ranges over ; here on the right-hand side denotes the state trajectory corresponding to the control and satisfying . (There is a slight abuse of notation here; the second argument of in (5.1) is a fixed point, and only the third argument is a function of time.) In accordance with Bellman's roadmap, our goal is to derive a

To this end, let us introduce the *value function*

where the notation indicates that the control is restricted to the interval . Loosely speaking, we can think of as the optimal cost (cost-to-go) from . It is important to note, however, that the existence of an optimal control--and hence of the optimal cost--is not actually assumed, which is why we work with an infimum rather than a minimum in (5.2). If an optimal control exists, then the infimum turns into a minimum and coincides with the optimal cost-to-go. In general, the infimum need not be achieved, and might even equal for some .

It is clear that the value function must satisfy the boundary condition

In particular, if there is no terminal cost ( ) then we have . The boundary condition (5.3) is of course a consequence of our specific problem formulation. If the problem involved a more general target set , then the boundary condition would read for .

The basic principle of dynamic programming for the present case is
a continuous-time counterpart of the principle of optimality
formulated in Section 5.1.1, already familiar to us
from Chapter 4. Here we can state this property as
follows, calling it again the **principle of optimality**:
*For every
and every
, the value function
defined in (5.2) satisfies the relation*

The above principle of optimality may seem obvious. However, it is important to justify it rigorously, especially since we are using an infimum and not assuming existence of optimal controls. We give ``one half" of the proof by verifying that

where denotes the right-hand side of (5.4):

By (5.2) and the definition of infimum, for every there exists a control on such that

Writing for the corresponding state trajectory, we have

where the two inequalities follow directly from the definitions of and , respectively. Since (5.6) holds with an arbitrary , the desired inequality (5.5) is established.