6.1.1 Candidate optimal feedback law

Next: 6.1.2 Riccati differential equation Up: 6.1 Finite-horizon LQR problem Previous: 6.1 Finite-horizon LQR problem Contents Index

6.1.1 Candidate optimal feedback law

We begin our analysis of the LQR problem by inspecting the necessary conditions for optimality provided by the maximum principle. After some further manipulation, these conditions will reveal that an optimal control must be a linear state feedback law. The Hamiltonian is given by

$\displaystyle H(t,x,u,p)=p^TA(t)x+p^TB(t)u-x^TQ(t)x-u^TR(t)u.$

Note that, compared to the general formula (4.40) for the Hamiltonian, we took the abnormal multiplier

to be equal to

. This is no loss of generality because, for the present free-endpoint problem in the Bolza form, a combination of the results in Section 4.3.1 would give us the transversality condition $0=p^*(t_1)-p_0^*{K}_{x}(x^*(t_1))=p^*(t_1)-2p_0^*Mx^*(t_1)$ which, in light of the nontriviality condition, guarantees that $p_0^*\ne 0$ . It is also useful to observe that the LQR problem can be adequately treated with the variational approach of Section 3.4, which yields essentially the same necessary conditions as the maximum principle but without the abnormal multiplier appearing. Indeed, the control is unconstrained, the final state is free, and

is quadratic--hence twice differentiable--in

; therefore, the technical issues discussed in Section 3.4.5 do not arise here. In Section 3.4 we proved that along an optimal trajectory we must have $\left.{H}_{u}\right\vert _{*}=0$ and $\left.{H}_{{u}{u}}\right\vert _{*}\le 0$ , which is in general different from the Hamiltonian maximization condition, but in the present LQR setting this difference disappears as we will see in a moment. In fact, when solving part a) of Exercise 3.8, the reader should have already written down the necessary conditions for optimality from Section 3.4.3 for the LQR problem (with

). We will now rederive these necessary conditions and examine their consequences in more detail.

The gradient of with respect to is ${H}_{u}=B^T(t)p-2R(t)u$ , and along an optimal trajectory it must vanish. Using our assumption that is invertible for all , we can solve the resulting equation for and conclude that an optimal control (if it exists) must satisfy

$\displaystyle u^*(t)=\frac12 R^{-1}(t)B^T(t)p^*(t).$

(6.3)

Moreover, since ${H}_{{ u}{u}}=-2R(t)<0$ , the above control indeed maximizes the Hamiltonian (globally). We see that (6.3) is the unique control satisfying the necessary conditions, although we have not yet verified that it is optimal.

Since the formula (6.3) expresses in terms of the costate , let us look at more closely. It satisfies the adjoint equation

$\displaystyle \dot p^*=\left.-{H}_{x}\right\vert _{*}=2Q(t)x^*-A^T(t)p^*$

(6.4)

with the boundary condition

$\displaystyle p^*(t_1)=-{K}_{x}(x^*(t_1))=-2Mx^*(t_1)$

(6.5)

(see Section 4.3.1), where

is the optimal state trajectory. Our next goal is to show that a linear relation of the form

$\displaystyle p^*(t)=-2P(t) x^*(t)$

(6.6)

holds for all

and not just for

, where $P(\cdot)$ is some matrix-valued function to be determined. Putting together the dynamics (6.1) of the state, the control law (6.3), and the dynamics (6.4) of the costate, we can write the system of canonical equations in the following combined closed-loop form:

$\displaystyle \begin{pmatrix}\dot x^*\\ \dot p^* \\ \end{pmatrix}= \begin{pmatr... ...\ p^* \\ \end{pmatrix}=:\mathcal H(t)\begin{pmatrix}x^*\\ p^* \\ \end{pmatrix}.$

(6.7)

The matrix $\mathcal H(t)$ is sometimes called the Hamiltonian matrix. Let us denote the transition matrix for the linear time-varying system (6.7) by $\Phi(\cdot,\cdot)$ . Then we have, in particular, $\Big({\textstyle{x^*(t)}\atop \textstyle{ p^*(t)}}\Big)=\Phi(t,t_1)\Big({\textstyle{x^*(t_1)}\atop \textstyle{ p^*(t_1)}}\Big)$ ; here $\Phi(t,t_1)=\Phi^{-1}(t_1,t)$ propagates the solutions backward in time from

. Partitioning $\Phi$ into $n\times n$ blocks as

$\displaystyle \Phi=\begin{pmatrix} \Phi_{11}& \Phi_{12}\\ \Phi_{21} & \Phi_{22}\\ \end{pmatrix}$

gives the more detailed relation

$\displaystyle \begin{pmatrix} x^*(t)\\ p^*(t) \\ \end{pmatrix}= \begin{pmat... ...,t_1)\\ \end{pmatrix} \begin{pmatrix} x^*(t_1)\\ p^*(t_1) \\ \end{pmatrix}$

which, in view of the terminal condition (6.5), can be written as

$\displaystyle x^*(t)$	$\displaystyle =(\Phi_{11}(t,t_1)-2\Phi_{12}(t,t_1)M)x^*(t_1)$	(6.8)
$\displaystyle p^*(t)$	$\displaystyle =(\Phi_{21}(t,t_1)-2\Phi_{22}(t,t_1)M)x^*(t_1)$	(6.9)

Solving (6.8) for

and plugging the result into (6.9), we obtain

$\displaystyle p^*(t)= \big(\Phi_{21}(t,t_1)-2\Phi_{22}(t,t_1)M\big) \big(\Phi_{11}(t,t_1)-2\Phi_{12}(t,t_1)M\big)^{-1}x^*(t).$

We have thus established (6.6) with

$\displaystyle P(t):=-\frac12\big(\Phi_{21}(t,t_1)-2\Phi_{22}(t,t_1)M\big) \big(\Phi_{11}(t,t_1)-2\Phi_{12}(t,t_1)M\big)^{-1}.$

(6.10)

A couple of remarks are in order. First, we have not yet justified the existence of the inverse in the definition of . For now, we note that $\Phi(t_1,t_1)=I_{2n\times 2n}$ , hence $\Phi_{11}(t_1,t_1)=I_{n\times n}$ , $\Phi_{12}(t_1,t_1)=0_{n\times n}$ , and so $\Phi_{11}(t_1,t_1)-2\Phi_{12}(t_1,t_1)M=I_{n\times n}$ . By continuity, $\Phi_{11}(t,t_1)-2\Phi_{12}(t,t_1)M$ stays invertible for close enough to , which means that is well defined at least for near . Second, the minus sign and the factor of in (6.10), which stem from the factor of in (6.6), appear to be somewhat arbitrary at this point. We see from (6.5) and (6.6) that

$\displaystyle P(t_1)=M.$

(6.11)

The reason for the above conventions will become clear later, when we show that

is symmetric positive semidefinite for all

(not just for

) and is directly related to the optimal cost.

Combining (6.3) and (6.6), we deduce that the optimal control must take the form

$\displaystyle \fbox{$u^*(t)=-R^{-1}(t)B^T(t)P(t)x^*(t)$}$

(6.12)

which, as we announced earlier, is a linear state feedback law. This is a remarkable conclusion, as it shows that the optimal closed-loop system must be linear.^6.1 Note that the feedback gain in (6.12) is time-varying even if the system (6.1) is time-invariant, because

from (6.10) is always time-varying. We remark that we could just as easily derive an open-loop formula for

by writing (6.8) with

in place of

, i.e., $x_0=(\Phi_{11}(t_0,t_1)-2\Phi_{12}(t_0,t_1)M)x^*(t_1)$ , solving it for

and plugging the result into (6.9) to arrive at

$\displaystyle p^*(t)=\big(\Phi_{21}(t,t_1)-2\Phi_{22}(t,t_1)M\big) \big(\Phi_{11}(t_0,t_1)-2\Phi_{12}(t_0,t_1)M\big)^{-1}x_0$

(provided that the inverse exists), and then using this expression in (6.3). However, the feedback form of

is theoretically revealing and leads to a more compact description of the closed-loop system.

As we said, there are two things that we still need to check: optimality of the control that we found, and global existence of the matrix . These issues will be tackled in Sections 6.1.3 and 6.1.4, respectively. But first, we want to obtain a nicer description for the matrix , as the formula (6.10) is rather clumsy and not very useful (since calculating the transition matrix $\Phi$ analytically is in general impossible).

Next: 6.1.2 Riccati differential equation Up: 6.1 Finite-horizon LQR problem Previous: 6.1 Finite-horizon LQR problem Contents Index

Daniel 2010-12-20