next up previous contents index
Next: 6.1.2 Riccati differential equation Up: 6.1 Finite-horizon LQR problem Previous: 6.1 Finite-horizon LQR problem   Contents   Index

6.1.1 Candidate optimal feedback law

We begin our analysis of the LQR problem by inspecting the necessary conditions for optimality provided by the maximum principle. After some further manipulation, these conditions will reveal that an optimal control must be a linear state feedback law. The Hamiltonian is given by

$\displaystyle H(t,x,u,p)=p^TA(t)x+p^TB(t)u-x^TQ(t)x-u^TR(t)u.

Note that, compared to the general formula (4.40) for the Hamiltonian, we took the abnormal multiplier $ p_0$ to be equal to $ -1$ . This is no loss of generality because, for the present free-endpoint problem in the Bolza form, a combination of the results in Section 4.3.1 would give us the transversality condition $ 0=p^*(t_1)-p_0^*{K}_{x}(x^*(t_1))=p^*(t_1)-2p_0^*Mx^*(t_1)$ which, in light of the nontriviality condition, guarantees that $ p_0^*\ne 0$ . It is also useful to observe that the LQR problem can be adequately treated with the variational approach of Section 3.4, which yields essentially the same necessary conditions as the maximum principle but without the abnormal multiplier appearing. Indeed, the control is unconstrained, the final state is free, and $ H$ is quadratic--hence twice differentiable--in $ u$ ; therefore, the technical issues discussed in Section 3.4.5 do not arise here. In Section 3.4 we proved that along an optimal trajectory we must have $ \left.{H}_{u}\right\vert _{*}=0$ and $ \left.{H}_{{u}{u}}\right\vert _{*}\le 0$ , which is in general different from the Hamiltonian maximization condition, but in the present LQR setting this difference disappears as we will see in a moment. In fact, when solving part a) of Exercise 3.8, the reader should have already written down the necessary conditions for optimality from Section 3.4.3 for the LQR problem (with $ M=0$ ). We will now rederive these necessary conditions and examine their consequences in more detail.

The gradient of $ H$ with respect to $ u$ is $ {H}_{u}=B^T(t)p-2R(t)u$ , and along an optimal trajectory it must vanish. Using our assumption that $ R(t)$ is invertible for all $ t$ , we can solve the resulting equation for $ u$ and conclude that an optimal control $ u^*$ (if it exists) must satisfy

$\displaystyle u^*(t)=\frac12 R^{-1}(t)B^T(t)p^*(t).$ (6.3)

Moreover, since $ {H}_{{ u}{u}}=-2R(t)<0$ , the above control indeed maximizes the Hamiltonian (globally). We see that (6.3) is the unique control satisfying the necessary conditions, although we have not yet verified that it is optimal.

Since the formula (6.3) expresses $ u^*$ in terms of the costate $ p^*$ , let us look at $ p^*$ more closely. It satisfies the adjoint equation

$\displaystyle \dot p^*=\left.-{H}_{x}\right\vert _{*}=2Q(t)x^*-A^T(t)p^*$ (6.4)

with the boundary condition

$\displaystyle p^*(t_1)=-{K}_{x}(x^*(t_1))=-2Mx^*(t_1)$ (6.5)

(see Section 4.3.1), where $ x^*$ is the optimal state trajectory. Our next goal is to show that a linear relation of the form

$\displaystyle p^*(t)=-2P(t) x^*(t)$ (6.6)

holds for all $ t$ and not just for $ t=t_1$ , where $ P(\cdot)$ is some matrix-valued function to be determined. Putting together the dynamics (6.1) of the state, the control law (6.3), and the dynamics (6.4) of the costate, we can write the system of canonical equations in the following combined closed-loop form:

$\displaystyle \begin{pmatrix}\dot x^*\\ \dot p^* \\ \end{pmatrix}= \begin{pmatr...
...\ p^* \\ \end{pmatrix}=:\mathcal H(t)\begin{pmatrix}x^*\\ p^* \\ \end{pmatrix}.$ (6.7)

The matrix $ \mathcal H(t)$ is sometimes called the Hamiltonian matrix. Let us denote the transition matrix for the linear time-varying system (6.7) by $ \Phi(\cdot,\cdot)$ . Then we have, in particular, $ \Big({\textstyle{x^*(t)}\atop
\textstyle{ p^*(t)}}\Big)=\Phi(t,t_1)\Big({\textstyle{x^*(t_1)}\atop
\textstyle{ p^*(t_1)}}\Big)$ ; here $ \Phi(t,t_1)=\Phi^{-1}(t_1,t)$ propagates the solutions backward in time from $ t_1$ to $ t$ . Partitioning $ \Phi$ into $ n\times n$ blocks as

$\displaystyle \Phi=\begin{pmatrix}
\Phi_{11}& \Phi_{12}\\
\Phi_{21} & \Phi_{22}\\

gives the more detailed relation

$\displaystyle \begin{pmatrix}
p^*(t) \\
\end{pmatrix}= \begin{pmat...
\end{pmatrix} \begin{pmatrix}
p^*(t_1) \\

which, in view of the terminal condition (6.5), can be written as

$\displaystyle x^*(t)$ $\displaystyle =(\Phi_{11}(t,t_1)-2\Phi_{12}(t,t_1)M)x^*(t_1)$ (6.8)
$\displaystyle p^*(t)$ $\displaystyle =(\Phi_{21}(t,t_1)-2\Phi_{22}(t,t_1)M)x^*(t_1)$ (6.9)

Solving (6.8) for $ x^*(t_1)$ and plugging the result into (6.9), we obtain

$\displaystyle p^*(t)=

We have thus established (6.6) with

$\displaystyle P(t):=-\frac12\big(\Phi_{21}(t,t_1)-2\Phi_{22}(t,t_1)M\big) \big(\Phi_{11}(t,t_1)-2\Phi_{12}(t,t_1)M\big)^{-1}.$ (6.10)

A couple of remarks are in order. First, we have not yet justified the existence of the inverse in the definition of $ P(t)$ . For now, we note that $ \Phi(t_1,t_1)=I_{2n\times 2n}$ , hence $ \Phi_{11}(t_1,t_1)=I_{n\times n}$ , $ \Phi_{12}(t_1,t_1)=0_{n\times n}
$ , and so $ \Phi_{11}(t_1,t_1)-2\Phi_{12}(t_1,t_1)M=I_{n\times n}$ . By continuity, $ \Phi_{11}(t,t_1)-2\Phi_{12}(t,t_1)M$ stays invertible for $ t$ close enough to $ t_1$ , which means that $ P(t)$ is well defined at least for $ t$ near $ t_1$ . Second, the minus sign and the factor of $ 1/2$ in (6.10), which stem from the factor of $ -2$ in (6.6), appear to be somewhat arbitrary at this point. We see from (6.5) and (6.6) that

$\displaystyle P(t_1)=M.$ (6.11)

The reason for the above conventions will become clear later, when we show that $ P(t)$ is symmetric positive semidefinite for all $ t$ (not just for $ t=t_1$ ) and is directly related to the optimal cost.

Combining (6.3) and (6.6), we deduce that the optimal control must take the form

$\displaystyle \fbox{$u^*(t)=-R^{-1}(t)B^T(t)P(t)x^*(t)$}$ (6.12)

which, as we announced earlier, is a linear state feedback law. This is a remarkable conclusion, as it shows that the optimal closed-loop system must be linear.6.1 Note that the feedback gain in (6.12) is time-varying even if the system (6.1) is time-invariant, because $ P$ from (6.10) is always time-varying. We remark that we could just as easily derive an open-loop formula for $ u^*$ by writing (6.8) with $ t_0$ in place of $ t$ , i.e., $ x_0=(\Phi_{11}(t_0,t_1)-2\Phi_{12}(t_0,t_1)M)x^*(t_1)
$ , solving it for $ x^*(t_1)$ and plugging the result into (6.9) to arrive at

$\displaystyle p^*(t)=\big(\Phi_{21}(t,t_1)-2\Phi_{22}(t,t_1)M\big)

(provided that the inverse exists), and then using this expression in (6.3). However, the feedback form of $ u^*$ is theoretically revealing and leads to a more compact description of the closed-loop system.

As we said, there are two things that we still need to check: optimality of the control $ u^*$ that we found, and global existence of the matrix $ P(t)$ . These issues will be tackled in Sections 6.1.3 and 6.1.4, respectively. But first, we want to obtain a nicer description for the matrix $ P(t)$ , as the formula (6.10) is rather clumsy and not very useful (since calculating the transition matrix $ \Phi$ analytically is in general impossible).

next up previous contents index
Next: 6.1.2 Riccati differential equation Up: 6.1 Finite-horizon LQR problem Previous: 6.1 Finite-horizon LQR problem   Contents   Index
Daniel 2010-12-20