Suppose that we are given a (time-invariant) control system

whose state takes values in some -dimensional manifold and whose control takes values in some control set . For its solution to stay in , the velocity vector must be tangent to at for all and . Thus we see that there is an important difference between the states and the velocity vectors: the former live in while the latter live in the tangent bundle . When we worked over , which is its own tangent space, we never explicitly made this distinction. In local coordinates, the system description takes the form

and the difference between states and tangent vectors becomes ``hidden" once again.

Let us assume for simplicity that an optimal control problem is formulated in the Mayer form (i.e., with terminal cost only). We know that problems with running cost can always be converted to this form by appending an additional state
, which would yield a system on the augmented manifold
(cf. Sections 3.3.2 and 4.2.1). The basic ingredients of the maximum principle are the costate
and the Hamiltonian
. In the case when
, the Hamiltonian for the Mayer problem took the form
. For a general manifold
, we need to ask ourselves which space
should belong to and how
should be (re)defined. Our first natural guess might be that
, like
, should be a tangent vector to
. However, in contrast with
, there is no clear geometric reason why
should be a tangent vector. Also, taking
to be a tangent vector, we cannot assign a new meaning to our earlier definition of
unless we equip the tangent space with an inner product. (Introducing an inner product on each tangent space
--called a *Riemannian metric* on
--is possible but, as we will see, is neither necessary nor relevant for our present purposes.) Another option that might come to mind is that
should live in
itself; however, this choice offers even fewer clues towards any natural interpretation of the Hamiltonian.

Can we perhaps take a more direct guidance from the fact that in our old definition of the Hamiltonian,
appears in an inner product with the velocity vector
? In fact, we already remarked in Section 3.4.2 that
never appears by itself but always inside inner products such as
; in other words, it
*acts* on velocity vectors. This observation suggests that the intrinsic role of the costate
is not that of a tangent vector,
but that of a *covector*. To better understand the difference between these two types of objects and why the latter one correctly captures the notion of a costate, let us look at how they propagate along a flow induced by a dynamical system on
.

Fix a number and let be a map. While the construction that we are about to describe is valid for every such map, the map that we have in mind here is the one obtained by flowing forward for units of time along the trajectory of the system (7.2) corresponding to some fixed control (which, ultimately, is taken to be an optimal control for a given initial condition). Let us first discuss the transformation that induces on tangent vectors. Pick a point and a tangent vector . We know that is tangent to some curve in passing through , namely, where for real (around 0) and . The image of this curve under the map is a curve in which passes through , as illustrated in Figure 7.1. Denote the tangent vector at associated with this new curve by ; in other words, define

The above quantity depends only on the vector and not on the choice of a particular curve with this tangent vector. In this way we obtain a natural definition of a linear map

called the

(see Section 4.2.4). The derivative map pushes the tangent vectors forward in the direction of action of the original map on . Objects such as tangent vectors, which propagate forward along a flow on in this sense, are called

Now suppose that we are given a *covector* at
, i.e., a
linear function on
. Let us denote it by
so as to have
for each
.
For the same map
as before, can we define in a natural way a linear function
on
? We must decide what the value
should be for every
.
While it is tempting to say that
should equal the value of
on the preimage of
under the map
, this preimage is not well defined unless the map
is invertible. In fact, the reader will quickly realize that there is no apparent candidate map for propagating covectors along
similarly to how the derivative map
acts on tangent vectors. The reason is that, instead of trying to push covectors forward, we should *pull them back*. This revised objective is readily accomplished as follows:
given a covector
on
, define a covector
on
by

As we indicated earlier, the intended meaning of is that of flowing forward for units of time along an optimal trajectory of (7.2), and the infinitesimal (as ) transformation induced by the derivative map is the variational equation (7.3). We can now recognize the formula (7.4) as expressing--in an intrinsic, coordinate-free fashion--the adjoint property from Section 4.2.8; indeed, it guarantees that stays constant along the trajectory. The familiar adjoint equation is nothing but the infinitesimal version of (7.4) written in local coordinates. A fact not really revealed by this differential equation is that covectors are

Now everything is beginning to fall into place. The Hamiltonian for our Mayer problem on a manifold should be defined as

where the costate is a covector at (strictly speaking, it would be more accurate to write it as ). The maximum principle postulates the existence of a costate for each , where is the optimal trajectory being analyzed. The terminal value uniquely specifies for all as explained above. The Hamiltonian maximization condition takes the same form as in Chapter 4. For a formal statement of the maximum principle on manifolds along these lines, see the references listed in Section 7.5. There is, however, one more concept that is usually involved when such results are stated in the literature, and we examine it briefly in the next subsection.