3.4.1 Preliminaries

Next: 3.4.2 First variation Up: 3.4 Variational approach to Previous: 3.4 Variational approach to Contents Index

3.4.1 Preliminaries

Consider the optimal control problem from Section 3.3 with the following additional specifications: the target set is $S= \{t_1\}\times\mathbb{R}^n$ , where is a fixed time (so this is a fixed-time, free-endpoint problem); $U=\mathbb{R}^m$ (the control is unconstrained); and the terminal cost is , with no direct dependence on the final time (just for simplicity). We can rewrite the cost in terms of the fixed final time as

$\displaystyle J(u)=\int_{t_0}^{t_1} L(t,x(t),u(t))dt+K(x(t_1)).$

(3.21)

Our goal is to derive necessary conditions for optimality. Let $u^*(\cdot)$ be an optimal control, by which we presently mean that it provides a global minimum: $J(u^*)\le J(u)$ for all piecewise continuous controls

. Let $x^*(\cdot)$ be the corresponding optimal trajectory. We would like to consider nearby trajectories of the familiar form

$\displaystyle x=x^*+\alpha\eta$

(3.22)

but we must make sure that these perturbed trajectories are still solutions of the system (3.18), for suitably chosen controls. Unfortunately, the class of perturbations $\eta$ that are admissible in this sense is difficult to characterize if we start with (3.23). Note also that the cost

, whose first variation we will be computing, is a function of

and not of

. Thus, in the optimal control context it is more natural to directly perturb the control instead, and then define perturbed state trajectories in terms of perturbed controls. To this end, we consider controls of the form

$\displaystyle u=u^*+\alpha\xi$

(3.23)

where $\xi$ is a piecewise continuous function from

to $\mathbb{R}^m$ and $\alpha$ is a real parameter as usual. We now want to find (if possible) a function $\eta:[t_0,t_1]\to \mathbb{R}^n$ for which the solutions of (3.18) corresponding to the controls (3.24), for a fixed $\xi$ , are given by (3.23). Actually, we do not have any reason to believe that the perturbed trajectory depends linearly on $\alpha$ . Thus we should replace (3.23) by the more general (and more realistic) expression

$\displaystyle x=x^*+\alpha\eta+o(\alpha).$

(3.24)

It is obvious that $\eta(t_0)=0$ since the initial condition does not change. Next, we derive a differential equation for $\eta$ . Let us use the more detailed notation $x(t,\alpha)$ for the solution of (3.18) at time

corresponding to the control (3.24). The function $x(\cdot,\alpha)$ coincides with the right-hand side of (3.25) if and only if

$\displaystyle {x}_{\alpha}(t,0)=\eta(t)$

(3.25)

for all

. (We are assuming here that the partial derivative ${x}_{\alpha}$ exists, but its existence can be shown rigorously; cf. Section 4.2.4.) Differentiating the quantity (3.26) with respect to time and interchanging the order of partial derivatives, we have

$\displaystyle \dot\eta(t)$	$\displaystyle =\frac d{dt}{x}_{\alpha}(t,0)={x}_{{\alpha}{t}}(t,0)= {x}_{{t}{\a... ...ft.\frac d{d\alpha}\right\vert _{\alpha=0}f(t,x(t,\alpha),u^*(t)+\alpha \xi(t))$
	$\displaystyle = {f}_{x}(t,x(t,0),u^(t)){x}_{\alpha}(t,0)+{f}_{u}(t,x(t,0),u^(t))\xi(t)$
	$\displaystyle = {f}_{x}(t,x^(t),u^(t))\eta(t)+{f}_{u}(t,x^(t),u^(t))\xi(t)$

which we write more compactly as

$\displaystyle \dot \eta={f}_{x}(t,x^*,u^*)\eta+{f}_{u}(t,x^*,u^*)\xi=:\left.{f}_{x}\right\vert _{*}\eta+\left.{f}_{u}\right\vert _{*}\xi.$

(3.26)

Here and below, we use the shorthand notation $\left.\right\vert _{*}$ to indicate that a function is being evaluated along the optimal trajectory. The linear time-varying system (3.27) is nothing but the linearization of the original system (3.18) around the optimal trajectory. To emphasize the linearity of the system (3.27) we can introduce the notation $A_*(t):=\left.{f}_{x}\right\vert _{*}(t)$ and $B_*(t):=\left.{f}_{u}\right\vert _{*}(t)$ for the matrices appearing in it, bringing it to the form

$\displaystyle \dot \eta=A_*(t)\eta+B_*(t)\xi.$

(3.27)

The optimal control minimizes the cost given by (3.22), and the control system (3.18) can be viewed as imposing the pointwise-in-time (non-integral) constraint $\dot x(t)-f(t,x(t),u(t))=0$ . Motivated by Lagrange's idea for treating such constraints in calculus of variations, expressed by the augmented cost (2.53) on page , let us rewrite our cost as

$\displaystyle J(u)=\int_{t_0}^{t_1}\big(L(t,x(t),u(t))+p(t)\cdot(\dot x(t)-f(t,x(t),u(t)))\big)dt+K(x(t_1))$

for some $\mathcal C^1$ function $p:[t_0,t_1]\to\mathbb{R}^n$ to be selected later. Clearly, the extra term inside the integral does not change the value of the cost. The function $p(\cdot)$ is reminiscent of the Lagrange multiplier function $\lambda(\cdot)$ in Section 2.5.2 (the exact relationship between the two will be clarified in Exercise 3.6 below). As we will see momentarily,

is also closely related to the momentum from Section 2.4. We will be working in the Hamiltonian framework, which is why we continue to use the same symbol

by which we denoted the momentum earlier (while some other sources prefer $\lambda$ ).

We will henceforth use the more explicit notation $\langle \cdot,\cdot\rangle$ for the inner product in $\mathbb{R}^n$ . Let us introduce the Hamiltonian

$\displaystyle H(t,x,u,p):=\langle p,f(t,x,u)\rangle -L(t,x,u).$

(3.28)

Note that this definition matches our earlier definition of the Hamiltonian in calculus of variations, where we had $H(x,y,y',p)=\langle p, y'\rangle -L(x,y,y')$ ; we just need to remember that after we changed the notation from calculus of variations to optimal control, the independent variable

became

, the dependent variable