Observe the difference between the law that the derivative of the momentum equals the force and the principle that the action integral is minimized. The former condition holds pointwise in time, while the latter is a statement about the entire trajectory. However, their relation is not surprising, because if the action integral is minimized then every small piece of the trajectory must also deliver minimal action.
We will turn our attention to control problems later in the book, and the present lack of depth in our treatment of non-integral constraints will be amply compensated for. In particular, the “distributed” Lagrange multiplier will correspond to the adjoint vector, or costate, in the maximum principle.
This issue, which escaped Legendre's attention, was pointed out by Lagrange in 1797. However, it was only in 1837, after 50 years had passed since Legendre’s investigation, that Jacobi closed the gap by providing a missing ingredient which we now describe.
The perturbation used in the above proof of the Weierstrass necessary condition is already quite close to the ones we will use later in the proof of the maximum principle. The main difference is that in the proof of the maximum principle, we will not insist on bringing the perturbed curve back to the original curve after the perturbation stops acting. Instead, we will analyze how the effect of a perturbation applied on a small interval propagates up to the terminal point.
The set in which the controls take values might also be constrained by some practical considerations, such as inherent bounds on physical quantities (velocities, forces, and so on). In the optimal control formulation, such constraints are incorporated very naturally by working with an appropriate control set. In calculus of variations, on the other hand, they would make the description of the space of admissible curves quite cumbersome.
The new problem formulation is equivalent to the original one, except that it is more general in one aspect: as we already mentioned, it avoids the (sometimes tacit) assumption made in calculus of variations that admissible curves are graphs of functions. Observe that in terms of complexity of the problem description, the burden has shifted from the cost functional to the right-hand side of the control system.
In summary, while the basic form of the necessary conditions provided by the maximum principle will be similar to what we obtained using the variational approach, several shortcomings of the variational approach must be overcome in order to obtain a more satisfactory result. Specifically, we need to accommodate constraints on the control set, constraints on the final state, and weaker differentiability assumptions. A less restrictive notion of “closeness” of controls will be the key to achieving these goals.
Combining the effects of temporal and spatial control perturbations, we will construct a convex cone, with vertex at the terminal state of the optimal trajectory, which describes infinitesimal directions of all possible perturbations of the terminal state.
When the optimal control is perturbed, the state trajectory deviates from the optimal one in a direction that makes a nonpositive inner product with the augmented adjoint vector (at the time when the perturbation stops acting). Therefore, such control perturbations can only decrease the Hamiltonian, regardless of the value of the perturbed control during the perturbation interval.
It is worth reflecting that the developments we have covered so far in this book – starting from the Euler-Lagrange equation, continuing to the Hamiltonian formulation, and culminating in the maximum principle – span more than 200 years. The progress made during this time period is quite remarkable, yet the origins of the maximum principle are clearly traceable to the early work in calculus of variations.
Time-optimal control problems not only provide a useful illustration of the maximum principle, but also have several important features that make them interesting in their own right. Among these are the bang-bang principle and connections with Lie brackets, to be discussed in this section. We organize the material so as to progress from more specific to more general settings, and we begin by revisiting Example 1.1 from Section 1.1.
Let us now investigate whether the preceding results can be extended beyond the class of linear control systems. Regarding the bang-bang principle cited in the previous paragraph, the hope that it might be true for general nonlinear systems is quickly shattered by the following example.
What the principle of optimality does for us here is guarantee that the paths we discard going backward cannot be portions of optimal trajectories. On the other hand, in the previous approach (going forward) we are not able to discard any paths until we reach the terminal time and finish the calculations.
Quite remarkably, the maximum principle was being developed in the Soviet Union independently around the same time as Bellman's and Kalman's work on dynamic programming was appearing in the United States. We thus find it natural at this point to compare the maximum principle with the HJB equation and discuss the relationship between the two approaches.
In this chapter we will focus on the special case when the system dynamics are linear and the cost is quadratic. While this additional structure certainly makes the optimal control problem more tractable, our goal is not merely to specialize our earlier results to this simpler setting. Rather, we want to go deeper and develop a more complete understanding of optimal solutions compared with what we were able to achieve for the general scenarios treated in the previous chapters.
It is useful to reflect on how we found the optimal control. First, we singled out a candidate optimal control by using the maximum principle. Second, we identified a candidate value function and verified that this function and the candidate control satisfy the sufficient condition for optimality.
In our discussion of Hamiltonian mechanics in Section 2.4, the Hamiltonian was given a clear physical interpretation as the total energy of the system. Hamilton's canonical differential equations, on the other hand, were derived formally from the Euler-Lagrange equation and we never paused to consider their intrinsic meaning. We fill this gap here by exposing the important connection between symplectic geometry and Hamiltonian flows, which provides one further insight into the geometric formulation of the maximum principle.