THE CLASSIFICATION OF CRITICAL POINTS - Multivariable Differential Calculus - Advanced Calculus of Several Variables

Advanced Calculus of Several Variables (1973)

Part II. Multivariable Differential Calculus

Chapter 8. THE CLASSIFICATION OF CRITICAL POINTS

Let f be a real-valued function of class Image on an open set Image, and let Image be a critical point of f. In order to apply Theorem 7.5 to determine the nature of the critical point a, we need to decide the character of the quadratic form Image defined by

Image

where the n × n matrix A = (aij) is defined by Image, and

Image

Note that the matrix A is symmetric, that is aij = aji, since Di Dj = Dj Di.

We have seen that, in order to determine whether q is positive-definite, negative-definite, or nondefinite, it suffices to determine the maximum and minimum values of q on the unit sphere Sn−1. The following result rephrases this problem in terms of the linear mapping Image defined by

Image

We shall refer to the quadratic form q, the symmetric matrix A, and the linear mapping L as being “associated” with one another.

Theorem 8.1 If the quadratic form q attains its maximum or minimum value on Sn−1 at the point Image, then there exists a real number λ such that

Image

where L is the linear mapping associated with q.

Such a nonzero vector v, which upon application of the linear mapping Image is simply multiplied by some real number λ, L(v) = λv, is called an eigenvector of L, with associated eigenvalue λ.

PROOF We apply the Lagrange multiplier method. According to Theorem 5.5, there exists a number Image such that

Image

where Image defines the sphere Sn−1.

But Dk g(x) = 2xk, and

Image

where the row vector Ak is the kth row of A, that is, Ak = (ak1 · · · akn). Consequently Eq. (1) implies that

Image

for each i = 1, . . . , n. But Ak v is the ith row of Av, while λvk is the kth element of the column vector λv, so we have

Image

as desired.

Image

Thus the problem of maximizing and minimizing q on Sn−1 involves the eigenvectors and eigenvalues of the associated linear mapping. Actually, it will turn out that if we can determine the eigenvalues without locating the eigenvectors first, then we need not be concerned with the latter.

Lemma 8.2 If Image is an eigenvector of the linear mapping Image associated with the quadratic form q, then

Image

where λ is the eigenvalue associated with v.

PROOF

Image

since Image.

Image

The fact that we need only determine the eigenvalues of L (and not the eigenvectors) is a significant advantage, because the following theorem asserts that we therefore need only solve the so-called characteristic equation

Image

Here ImageA − λIImage denotes the determinant of the matrix A − λI (where I is the n × n identity matrix).

Theorem 8.3 The real number λ is an eigenvalue of the linear mapping Image, defined by L(x) = Ax, if and only if λ satisfies the equation

Image

PROOF Suppose first that λ is an eigenvalue of L, that is,

Image

for some v ≠ 0. Then (A − λI)v = 0.

Enumerating the rows of the matrix A − λI, we obtain

Image

where e1, . . . , en are the standard unit vectors in Imagen. Hence the n vectors A1λe1, . . . , Anλen all lie in the (n − 1)-dimensional hyperplane through 0 perpendicular to v, and are therefore linearly dependent, so it follows from Theorem I.6.1 that

Image

as desired.

Conversely, if λ is a solution of ImageA − λIImage = 0, then the row vectors A1λe1, . . . , Anλen of A − λI are linearly dependent, and therefore all lie in some (n − 1)-dimensional hyperplane through the origin. If v is a unit vector perpendicular to this hyperplane, then

Image

Hence (A − λI)v = 0, so

Image

as desired.

Image

Corollary 8.4 The maximum (minimum) value attained by the quadratic form q(x) = xtAx on Sn−1 is the largest (smallest) real root of the equation

Image

PROOF If q(v) is the maximum (minimum) value of q on Sn−1, then v is an eigenvector of L(x) = Ax by Theorem 8.1, and q(v) = λ, the associated eigenvalue, by Lemma 8.2. Theorem 8.3 therefore implies that q(v) is a root of ImageA − λIImage = 0. On the other hand, it follows from 8.2 and 8.3 that every root of ImageA − λIImage = 0 is a value of q on Sn−1. Consequently q(v) must be the largest (smallest) root of ImageA − λIImage = 0.

Image

Example 1 Suppose a is a critical point of the function f: Image and that the quadratic form of f at a is

Image

so the matrix of q is

Image

The characteristic equation of A is then

Image

with roots λ = −1, 1, 3. It follows from Corollary 8.4 that the maximum and minimum values of q in S2 are + 3 and − 1 respectively. Since q attains both positive and negative values, it is nondefinite. Hence Theorem 7.5 implies that f has neither a maximum nor a minimum at a.

The above example indicates how Corollary 8.4 reduces the study of the quadratic form q(x) = xtAx to the problem of finding the largest and smallest real roots of the characteristic equation ImageA − λIImage = 0. However this problem itself can be difficult. It is therefore desirable to carry our analysis of quadratic forms somewhat further.

Recall that the matrix A of a quadratic form is symmetric, that is, aij = aji. We say that the linear mapping Image is symmetric if and only if L(x) · y = x · L(y) for all x and y in Imagen. The motivation for this terminology is the following result.

Lemma 8.5 If A is the matrix of the linear mapping Image, then L is symmetric if and only if A is symmetric.

PROOF If Image, then

Image

Hence L is symmetric if and only if

Image

for all i, j = 1, . . . , n. But

Image

andImage

The following theorem gives the precise character of a quadratic form q in terms of the eigenvectors and eigenvalues of its associated linear mapping L.

Theorem 8.6 If q is a quadratic form on Imagen, then there exists an orthogonal set of unit eigenvectors v1, . . . , vn of the associated linear mapping L. If y1, . . . , yn are the coordinates of Image with respect to the basis v1, . . . , vn, that is,

Image

then

Image

where λ1, . . . , λn are the eigenvalues of v1, . . . , vn.

PROOF This is trivially true if n = 1, for then q is of the form q(x) = λx2, with L(x) = λx. We therefore assume by induction that it is true in dimension n − 1.

If q attains its maximum value on the unit sphere Sn−1 at Image, then see Theorem 8.1 says that vn is an eigenvector of L. Fig. 2.40.

Image

Figure 2.40

Let Image be the subspace of Imagen perpendicular to vn. Then L maps Image into itself, because, given Image, we have

Image

using the symmetry of L (Lemma 8.5).

Let Image and Image be the “restrictions” of L and q to Image, that is,

Image

If (u1, . . . , un−1) are the coordinates of x with respect to an orthonormal basis for Imagen−1, then by Lemma 8.5,

Image

where B is a symmetric (n − 1) × (n − 1) matrix. Then

Image

so q* is a quadratic form in u1, . . . , un−1.

The inductive hypothesis therefore gives an orthogonal set of unit eigenvectors v1, . . . , vn−1 for L*. Then v1, . . . , vn is the desired orthogonal set of unit eigenvectors for L.

To prove the last statement of the theorem, let Image. Then

Image

since vi · vj = 0 unless i = j, and vi · vi = 1.

Image

Equation (2) may be written in the form

Image

The process of finding an orthogonal set of unit eigenvectors, with respect to which the matrix of q is diagonal, is sometimes called diagonalization of the quadratic form q.

Note that, independently of Corollary 8.4, it follows immediately from Eq. (2) that

(a)q is positive-definite if all the eigenvalues are positive,

(b)q is negative-definite if all the eigenvalues are negative, and

(c)q is nondefinite if some are positive and some are negative.

If one or more eigenvalues are zero, while the others are either all positive or all negative, then q is neither positive-definite nor negative-definite nor nondefinite. According to Lemma 8.7 below, this can occur only if the determinant of the matrix of q vanishes.

Example 2 If q(x, y, z) = xy + yz, the matrix of q is

Image

The characteristic equation is then

Image

with solutions Image. By substituting each of these eigenvalues in turn into the matrix equation

Image

(from the proof of Theorem 8.3), and solving for a unit vector v, we obtain the eigenvectors

Image

If u, v, w are coordinates with respect to coordinate axes determined by v1, v2, v3, that is, x = uv1 + vv2 + wv3, then Eq. (2) gives

Image

Lemma 8.7 If λ1, . . . , λn are eigenvalues of the symmetric n × n matrix A, corresponding to orthogonal unit eigenvectors v1, . . . , vn, then

Image

PROOF Let

Image

Then, since Avj = λivj, we have

Image

It follows that

Image

Since ImageVImage ≠ 0 because the vectors v1, . . . , vn are orthogonal and therefore linearly independent, this gives Eq. (3).

Image

We are finally ready to give a complete analysis of a quadratic form q(x) = xt Ax on Imagen for which ImageAImage ≠ 0. Writing A = (aij) as usual, we denote by Δk the determinant of the upper left-hand k × k submatrix of A, that is,

Image

Thus

Image

Theorem 8.8 Let q(x) = xt Ax be a quadratic form on Imagen with ImageAImage ≠ 0. Then q is

(a)positive-definite if and only if Δk >0 for each k = 1, . . . , n,

(b)negative-definite if and only if (−1)k Δk >0 for each k = 1, . . . , n,

(c)nondefinite if neither of the previous two conditions is satisfied.

PROOF OF (a) This is obvious if n = 1; assume by induction its truth in dimension n − 1.

Since 0 < Δn = ImageAImage = λ1λ2 · · · λn by Lemma 8.7, all the eigenvalues λ1, . . . , λn of A are nonzero. Either they are all positive, in which case we are finished, or an even number of them are negative. In the latter case, let λi and λj be two negative eigenvalues, corresponding to eigenvectors vi and vj.

Let Image be the 2-plane in Imagen spanned by vi and vj. Then, given Image, we have

Image

so q is negative on Image minus the origin.

Now let q* : Imagen − 1 Image be the restriction of q to Imagen−1, defined by

Image

where

Image

Since Δ1 >0, . . . , Δn − 1 >0, the inductive assumption implies that q* is positive-definite on Imagen−1. But this is a contradiction, since the (n − 1)-plane Imagen−1 and the 2-plane Image, on which q is negative, must have at least a line in common. We therefore conclude that q has no negative eigenvalues, and hence is positive-definite.

Conversely, if q is positive-definite, we want to prove that each Δk is positive. If m is the minimum value of q on Sn−1, then m >0.

Let qk : Imagek Image be the restriction of q to Imagek, that is,

Image

where Ak is the upper left-hand k × k submatrix of A, so Δk = ImageAkImage, and define Lk : Imagek Imagek by Lk(x) = Ak x. Then Lk is symmetric by Lemma 8.5.

Let ¼1, . . . , ¼k be the eigenvalues of Lk given by Theorem 8.6. Since by Lemma 8.2 each ¼i is the value of qk at some point of Sk−1, the unit sphere in Imagek, and qk(x) = q(x, 0), it follows that each Image. Finally Lemma 8.7implies that

Image

PROOF OF (b) Define the quadratic form q* : Imagen Image by q*(x) = −q(x). Then clearly q is negative-definite if and only if q* is positive-definite. By part (a), q* is positive-definite if and only if each Δk* is positive, where Δk* = ImageAkImage, Ak being the upper left-hand k × k submatrix of A. But, since each of the k rows of − Ak is −1 times the corresponding row of Ak, it follows that

Image

PROOF OF (c) Since Δn = λ1 λ2 · · · λn ≠ 0, all of the eigenvalues λ1, . . . , λn are nonzero. If all of them were positive, then part (a) would imply that Δk >0 for all k, while if all of them were negative, then part (b) would imply that (−1)k Δk >0 for all k. Since neither is the case, we conclude that q has both positive eigenvalues and negative eigenvalues, and is therefore nondefinite.

Image

Recalling that, if a is a critical point of the class Image function f : Image, the quadratic form of f at a is Image, Theorems 7.5 and 8.8 enable us to determine the behavior of f in a neighborhood of a,provided that the determinant

Image

often called the Hessian determinant of f at a, is nonzero. But if Image Di Djf(a)Image = 0, then these theorems do not apply, and ad hoc methods must be applied.

Note that the classical second derivative test for a function of two variables (Theorem 4.4) is an immediate corollary to Theorems 7.5 and 8.8.

It will clarify the picture at this point to generalize the discussion at the end of Section 6. Suppose for the sake of simplicity that the origin is a critical point of the class Image function f: Image, and that f(0) = 0. Then

Image

with

Image

Let v1, . . . , vn be an orthogonal set of unit eigenvectors (provided by Theorem 8.6), with associated eigenvalues λ1, . . . , λn. Assume that ImageDi Djf(0)Image ≠ 0, so Lemma 8.7 implies that each λi is nonzero. Let λ1, . . . , λk be the positive eigenvalues, and λk+1, . . . , λn the negative ones (writing k = 0 if all the eigenvalues are negative). Since

Image

if x = y1v1 + · · · + yn vn, it is clear that q has a minimum at 0 if k = n and a maximum if k = 0.

If Image, so that q has both positive and negative eigenvalues, denote by Image the subspace of Image spanned by v1, . . . , vk, and by Image the subspace of Imagen spanned by vk+1, . . . , vn. Then the graph of z = q(x) in Image and Image looks like Fig. 2.41; q has a minimum on Image and a maximum on Image.

Given Image < mini ImageλiImage, choose δ >0 such that

Image

or

Image

since Image. This inequality and Eq. (4) yield

Image

Using (5), we obtain

Image

Image

Figure 2.41

where ¼i = −λi >0, i = k + 1, . . . , n. Since λiImage >0 and ¼iImage >0 because Image < min ImageλiImage, inequalities (6) show that, for ImagexImage < δ, the graph of z = f(x) lies between two very close quadratic surfaces (hyperboloids) of the same type as z = q(x). In particular, if Image, then f has a local minimum on the subspace Image and a local maximum on the subspace Image. This makes it clear that the origin is a “saddle point” for f if q has both positive and negative eigenvalues. Thus the general case is just like the 2-dimensional case except that, when there is a saddle point, the subspace on which the critical point is a maximum point may have any dimension from 1 to n − 1.

It should be emphasized that, if the Hessian determinant ImageDiDjf(a)Image is zero, then the above results yield no information about the critical point a. This will be the situation if the quadratic form q of f at a is either positive-semidefinite but not positive-definite, or negative-semidefinite but not negative-definite. The quadratic form q is called positive-semidefinite if Image for all x, and negative-semidefinite if Image for all x. Notice that the terminology “q is nondefinite,” which we have been using, actually means that q is neither positive-semidefinite nor negative-semidefinite (so we might more descriptively have said “ nonsemidefinite”).

Example 3 Let f(x, y) = x2 − 2xy + y2 + x4 + y4, and g(x, y) = x2 − 2xy + y2x4y4. Then the quadratic form for both at the critical point (0, 0) is

Image

which is positive-semidefinite but not positive-definite. Since

Image

f has a relative minimum at (0, 0). However, since

Image

it is clear that g has a saddle point at (0, 0).

Thus, if the Hessian determinant vanishes, then anything can happen.

We conclude this section with discussion of a “second derivative test” for constrained (Lagrange multiplier) maximum–minimum problems. We first recall the setting for the general Lagrange multiplier problem: We have a differentiable function f: Image and a continuously differentiable mapping G : Imagem (m < n), and denote by M the set of all those points Image at which the gradient vectors ∇G1(x), . . . , ∇Gm(x), of the component functions of G, are linearly independent. By Theorem 5.7, M is an (n − m)-dimensional manifold in Imagen, and by Theorem 5.6 has an (n − m)-dimensional tangent space Tx at each point Image. Recall that, by definition, Tx is the subspace of Imagen consisting of all velocity vectors (at x) to differentiable curves on M which pass through x. This notation will be maintained throughout the present discussion.

We are interested in locating and classifying those points of M at which the function f attains its local maxima and minima on M. According to see Theorem 5.1, in order for f to have a local maximum or minimum on M at Image, it is necessary that a be a critical point for f on M in the following sense. The point Image is called a critical point for f on M if and only if the gradient vector ∇f(a) is orthogonal to the tangent space Ta of M at a. Since the linearly independent gradient vectors ∇G1(a), . . . , ∇Gm(a) generate the orthogonal complement to Ta (Fig. 2.42), it follows as in Theorem 5.8 that there exist unique numbers λ1, . . . , λm such that

Image

Image

Figure 2.42

We will obtain sufficient conditions for f to have a local extremum at a by considering the auxiliary function H : Image for f at a defined by

Image

Notice that (7) simply asserts that ∇H(a) = 0, so a is an (ordinary) critical point for H. We are, in particular, interested in the quadratic form

Image

of H at a.

Theorem 8.9 Let a be a critical point for f on M, and denote by q : Image the quadratic form at a of the auxiliary function Image, as above. If f and G are both of class Image in a neighborhood of a, then f has

(a)a local minimum on M at a if q is positive-definite on the tangent space Ta to M at a,

(b)a local maximum on M at a if q is negative-definite on Ta,

(c)neither if q is nondefinite on Ta.

The statement, that “q is positive-definite on Ta,” naturally means that q(h) >0 for all nonzero vectors Image; similarly for the other two cases.

PROOF The proof of each part of this theorem is an extension of the proof of the corresponding part of Theorem 7.5. We give the details only for part (a).

We start with the Taylor expansion

Image

for the auxiliary function H. Since Image, it clearly suffices to find a δ >0 such that

Image

ifImage

Let m be the minimum value attained by q on the unit sphere Image in the tangent plane Ta. Then m >0 because q is positive-definite on Ta. Noting that

Image

we choose δ >0 so small that

Image

and also so small that 0 < ImagehImage < δ and Image together imply that h/ImagehImage is sufficiently near to S* that

Image

Then (9) follows as desired. The latter condition on δ can be satisfied because S* is (by definition) the set of all tangent vectors at a to unit-speed curves on M which pass through a. Therefore, if Image and h is sufficiently small, it follows that h/ImagehImage is close to a vector in S*.

Image

The following two examples provide illustrative applications of Theorem 8.9.

Example 4 In Example 4 of Section 5 we wanted to find the box with volume 1000 having least total surface area, and this involved minimizing the function

Image

on the 2-manifold Image defined by the constraint equation

Image

We found the single critical point a = (10, 10, 10) for f on M, with Image. The auxiliary function is then defined by

Image

We find by routine computation that the matrix of second partial derivatives of h at a is

Image

so the quadratic form of h at a is

Image

It is clear that q is nondefinite on Image3. However it is the behavior of q on the tangent plane Ta that interests us.

Since the gradient vector ∇g(a) = (100, 100, 100) is orthogonal to M at a = (10, 10, 10), the tangent plane Ta is generated by the vectors

Image

Given Image, we may therefore write

Image

and then find by substitution into (10) that

Image

Since the quadratic form s2 + st + t2 is positive-definite (by the ac − b2 test), it follows that q is positive-definite on Ta. Therefore Theorem 8.9 assures us that f does indeed have a local minimum on M at the critical point a = (10, 10, 10).

Example 5 In Example 9 of Section 5 we sought the minimum distance between the circle x2 + y2 = 1 and the straight line x + y = 4. This involved minimizing the function

Image

on the 2-manifold Image defined by the constraint equations

Image

We found the geometrically obvious critical points (see Fig. 2.43), Image with Image It is obvious that f has a minimum at a, and neither a maximum nor a minimum at b, but we want to verify this using Theorem 8.9.

The auxiliary function H is defined by

Image

Image

Figure 2.43

By routine computation, we find that the matrix of second partial derivatives of H is

Image

Substituting Image, we find that the quadratic form of H at the critical point Image is

Image

Computing the subdeterminants Δ1, Δ2, Δ3, Δ4 of Theorem 8.8, we find that all are positive, so this quadratic form q is positive-definite on Image4, and hence on the tangent plane Ta (this could also be verified by the method of the previous example). Hence Theorem 8.9 implies that f does indeed attain a local minimum at a.

Substituting Image, we find that the quadratic form of H at the other critical point Image is

Image

Now the vectors v1 = (1, −1, 0, 0) and v2 = (0, 0, 1, −1) obviously lie in the tangent plane Tb. Since

Image

we see that q is nondefinite on Tb, so Theorem 8.9 implies that f does not have a local extremum at b.

Exercises

8.1This exercise gives an alternative proof of Theorem 8.8 in the 3-dimensional case. If

Image

where A = (aij) is a symmetric 3 × 3 matrix with Δ1 ≠ 0, Δ2 ≠ 0, show that

Image

Conclude that q is positive-definite if Δ1, Δ2, Δ3 are all positive, while q is negative-definite if Δ1 < 0, Δ2 >0, Δ3 < 0.

8.2Show that q(x, y, z) = 2x2 + 5y2 + 2z2 + 2xz is positive-definite by

(a)applying the previous exercise,

(b)solving the characteristic equation.

8.3Use the method of Example 2 to diagonalize the quadratic form of Exercise 8.2, and find its eigenvectors. Then sketch the graph of the equation 2x2 + 5y2 + 2z2 + 2xz = 1.

8.4This problem deals with the function Image defined by

Image

Note that 0 = (0, 0, 0) is a critical point of f.

(a)Show that q(x, y, z) = 2x2 +5y2 + 2z2 + 2xz is the quadratic form of f at 0 by substituting the expansion Image, collecting all second degree terms, and verifying the appropriate condition on the remainder Image(x, y, z). State the uniqueness theorem which you apply.

(b)Write down the symmetric 3 × 3 matrix A such that

Image

By calculating determinants of appropriate submatrices of A, determine the behavior of f at 0.

(c)Find the eigenvalues λ1, λ2, λ3 of q. Let v1, v2, v3 be the eigenvectors corresponding to λ1, λ2, λ3 (do not solve for them). Express q in the uvw-coordinate system determined by v1, v2, v3. That is, write q(uv1 + uv2 + wv3) in terms of u, v, w. Then give a geometric description (or sketch) of the surface

Image

8.5Suppose you want to design an ice-cube tray of minimal cost, which will make one dozen ice “cubes” (not necessarily actually cubes) from 12 in.3 of water. Assume that the tray is divided into 12 square compartments in a 2 × 6 pattern as shown in Fig. 2.44, and that

Image

Figure 2.44

the material used costs 1 cent per square inch. Use the Lagrange multiplier method to minimize the cost function f(x, y, z) = xy + 3xz + 7yz subject to the constraints x = 3y and xyz = 12. Apply Theorem 8.9 to verify that you do have a local minimum.

8.6Apply Theorem 8.8 to show that the quadratic form

Imageis nondefinite. Solve the characteristic equation for its eigenvalues, and then find an ortho-normal set of eigenvectors.

8.7Let f(x) = xt Ax be a quadratic form on Imagen (with A symmetric), and let v1, . . . , vn be the orthonormal eigenvectors given by Theorem 8.6. If P = (v1, . . . , vn) is the n × n matrix having these eigenvectors as its column vectors, show that

Image

Hint: Let x = Py = y1v1 + · · · + yn vn. Then

Image

by the definition of f, while

Image

by Theorem 8.6. Writing B = PtAP, we have

Image

for all y = (y1, . . . , yn). Using the obvious symmetry of B prove that bii = λi, while bij = 0 if i ≠ j. Finally note that P−1 = Pt by Exercise 6.11 of Chapter I.