Formulations and solutions of the problems of Dido, catenary, and brachistochrone, as well as related historical remarks, are given in [GF63,You80,Mac05] and many other sources. For an enlightening discussion of light reflection and refraction, see [FLS63, Chapter I-26], where there is also an amusing (although perhaps not entirely politically correct) alternative description of refraction in terms of choosing the fastest path from the beach to the water to save a drowning girl. A comprehensive, insightful, and mathematically accurate account of the historical development of calculus of variations is given in [Gol80]; this book traces the roots of the subject to Fermat's principle of least time, which allowed the use of calculus for analyzing light refraction and later inspired Johann Bernoulli's solution of the brachistochrone problem. For an in-depth treatment of the brachistochrone problem we also recommend the paper [SW97]; it is explained there that this problem effectively marked the birth of the field of optimal control, because it started a steady research activity on time-optimal and related variational problems still studied today in optimal control theory. Regarding the catenary, it is interesting to mention that the inverted catenary shape has been used for building arches from ancient times to present day (notable examples include several buildings designed by Gaudi in Catalonia and the Gateway Arch in St. Louis, Missouri).

Sections 2.2 and 2.3 follow Chapter 1 of [GF63], the text on which most of the present chapter is based; see also [Mac05]. Exercise 2.2 is borrowed from [Jur96, p. 341]. Example 2.3 is treated in [Vinter, p. 18], where it is followed by a discussion of existence of optimal solutions. Differentiability assumptions under which the Euler-Lagrange equation is valid are discussed, in addition to [GF63] and [Mac05], in [Sus00, Handout 2]. Invariance of the Euler-Lagrange equation under changes of coordinates is demonstrated in [GF63] for a single degree of freedom and in [Sus00, Handout 2] for multiple degrees of freedom. The treatment of variable-endpoint problems in [GF63] includes the case of both endpoints lying on given vertical lines, as well as a variable-terminal-point version of the brachistochrone problem. A more general study leading to transversality conditions can be found in [GF63, Chapter 3] (although it relies on the general formula for the variation of a functional which we do not give) and in [SW77, Chapter 3].

The material of Section 2.4 is covered in [GF63, Chapter 4]; the Hamiltonian and the canonical variables also appear in [GF63, Chapter 3] in the general formula for the variation of a functional. Other sources of relevant information include [Arn89], [Mac05], and [Sus00, Handout 3]. Section 14 of [Arn89] mentions several applications of the Legendre transformation, including an elegant derivation of Young's inequality. Convexity of the Legendre transform (which is also called the conjugate function) is used in dual optimization methods; see [BV04, Sections 3.3 and 5.1]. The symmetric way of writing the Legendre transformation via the formula (2.36) is prompted by the presentation in [YZ99, pp. 220-221]. All these references also cover the Legendre transformation for functions of several variables. For an insightful discussion of how the Hamiltonian should be interpreted and why the maximum principle was not discovered much earlier, see [Sus00, Handout 3] and [SW97]. A nice exposition of the principle of least action can be found in [FLS63, Chapter II-19], and the reader intrigued by our brief remark about Einstein's theory of gravitation is advised to check Chapter II-42 of the same book. Conservation laws are derived in [GF63, Chapter 4] with the help of Noether's theorem, which is another application of the general formula for the variation of a functional. Conservation of angular momentum is also discussed in detail in [Arn89] (see in particular Example 2 in Section 13).

Section 2.5 is based on [GF63, Section 12] and [Mac05, Chapter 5], where additional details (such as the treatment of several integral constraints and multiple degrees of freedom, as well as a derivation of the necessary condition for the case of holonomic non-integral constraints) can be found. The book [You80] examines Lagrange's naive argument in detail and criticizes ``its reappearance, every so often, in so-called accounts and introductions that claim to present the calculus of variations to engineers and other supposedly uncritical persons" (see the preamble to Volume II). The pendulum example is discussed in [Mac05, pp. 37 and 83]. For more information on control systems with nonholonomic constraints, including optimal control problems, the reader can consult [Blo03] and the references therein.

Section 2.6 is largely subsumed by Chapter 5 of [GF63]. Among additional topics covered there are a detailed study of conjugate points, a derivation of Legendre's condition for multiple degrees of freedom, and a connection with Sylvester's positive definiteness criterion for quadratic forms. It is possible to prove Legendre's condition for multiple degrees of freedom differently from how it is done in [GF63], without integrating by parts and without assuming that the matrix is symmetric; namely, one can perturb along directions of eigenvectors of (at an arbitrary fixed point ) and then invoke the scalar result, as in [LL50, pp. 94-95]. Our reasoning that the absence of conjugate points leads to the existence of a nonzero solution of the Jacobi equation is close to the argument given in [Mac05, Section 9.4], where the reader can find some missing details.