Directional derivatives and gradient - Differentiation - Two-Dimensional Calculus

Two-Dimensional Calculus (2011)

Chapter 2. Differentiation

6. Directional derivatives and gradient

The partial derivatives fx and fy at a point measure the rate of change of the function f in the x and y directions, respectively. They may be considered to be special cases of the directional derivative, which measures the rate of change of / in an arbitrary direction, and which we define below.

We use the following notation throughout this section. Let

Image

be the unit vector making an angle α with the positive x direction.

Given any point (x0, y0), we wish to measure the rate of change of f in the direction of Tα. We simply take the change in the function as we move a distance Δs in this direction, divide by Δs, and take the limit as Δs tends to zero (Fig. 6.1). We use the notation ∇αf(x0, y0) for this quantity.

Image

FIGURE 6.1 Directional derivative ∇α

Definition 6.1 The directional derivative of a function f(x, y) at a point (x0, y0) in the direction of the vector Imagecos α, sin αImage is

Image

Note that the partial derivatives fx and fy are special cases:

Image

In general, a function may have directional derivatives in some directions, but not in others. Again we may refer to the example defined by Eq. (5.1). For that function, the directional derivatives at the origin exist in the x and ydirections, but in no other direction. This illustrates again the inadequacy of partial derivatives for describing the way a function varies in an arbitrary direction. The surprising fact is that if we add to the assumption that the partial derivatives exist at each point the further assumption that they are continuous, then we are able to draw two conclusions.

1. The directional derivatives must exist in all directions.

2. The value of the directional derivative at any point in a given direction depends only on the direction and on the partial derivatives fx, fy at the point.

The proof of these facts is based on the fundamental Lemma 5.1. We have the following theorem.

Theorem 6.1 Let f(x, y) be continuously differentiable in a domain D. Then at any point (x0, y0), the directional derivativeαf(x0, y0) exists for all α and is given by

Image

PROOF. Setting x = x0 + Δs cos α, y = y0 + Δs sin α in Eq. (5.3), we find

Image

Dividing by Δs yields

Image

But Image and the fundamental Lemma 5.1 asserts precisely that

Image

Thus, applying Def. 6.1 to Eq. (6.5) we obtain Eq. (6.4). Image

Example 6.1

Let

Image

and

Image

then

Image

Note that the fact that the directional derivative is negative means that the function is decreasing in this direction (Fig. 6.2a)

Image

FIGURE 6.2 Directions of derivatives in Examples 6.1 and 6.2

Example 6.2

Using the same function and same point as in Example 6.1, let

Image

Note that in this case Tα is a unit vector in a direction perpendicular to the radius vector Image3, 2Image . The rate of change of the function in this direction is zero (Fig. 6.2b).

There are two physical interpretations that are most helpful in understanding the directional derivative. The first is obtained by picturing the domain D to be a thin plate and the function f(x, y) the temperature at each point of the plate. Then the directional derivative at any point tells us whether, as we move in a given direction, we are getting hotter or colder, and how fast.

The other interpretation concerns the surface z = f(x, y). We draw a line through (x0, y0) in the direction of Tα and consider the vertical plane through this line. The intersection of that plane with the surface defines a curve whose slope is precisely ∇αf(x0, y0) (Fig. 6.3). If we return to our description of the surface as a mountainous terrain, where f(x, y) is the altitude, then the directional derivative is positive or negative according to whether we are ascending or descending when walking in the given direction. The magnitude of the directional derivative simply measures how steep a path would be in that direction.

Image

FIGURE 6.3 Geometric interpretation of the directional derivative

We return now to Eq. (6.4), and we observe that the right-hand side has the form of a dot product of two vectors, one of them being Tα. The other one would have components fx, fy at the point. This vector is of basic importance for functions of two variables. It is called the gradient vector of f and is denoted by f.

Definition 6.2 Let f(x, y) be defined in a domain D. At any point (x0, y0) where the partial derivatives fx and fy exist, we define the gradient of f to be the vector whose components are these partial derivatives :

Image

Remark on Notation One often sees the notation grad f for the gradient, or grad f where the boldface type indicates a vector quantity. The upside-down delta used in Eq. (6.6) is called a nabla, but when used to operate on a function, as in Eq. (6.6), it is usually referred to as del. Thus one reads “f” as “del f.” The boldface is used to emphasize the fact that the gradient of a function defines a vector at each point. In the notation ∇αf, we do not use boldface, since the directional derivative is a number, rather than a vector.

We may now rewrite Eq. (6.4) in vector notation, using Eqs. (6.1) and (6.6):

Image

This is at first merely a different notation for the same thing. However, it leads to important new information. Recalling that the dot product of two vectors is the product of their lengths times the cosine of the angle in between and observing that Tα is a vector of length 1, we have

Image

where θ is the angle between f(x0, y0) and Tα (Fig. 6.4).

Image

FIGURE 6.4 Gradient vector and directional derivative

From Eq. (6.8) we deduce immediately the following basic facts.

Let f(x, y) be continuously differentiable in D. Then at each point (x0, y0) of D we have a fixed vector

Image

1. If |f(x0, y0)| = 0 (i.e., if both partial derivatives at the point are zero), then all directional derivatives at the point are zero.

2. If |f(x0, y0)| ≠ 0, then

Image

Tα is one of the two directions perpendicular to f(x0, y0) (Fig. 6.5).

3. Since |cos θ| ≤ 1, the value of the directional derivative always varies between ± |f(x0, y0)|. If |f(x0, y0)| 0, then as α varies the directional derivative takes on its maximum value, |f(x0, y0)|, precisely when θ = 0, i.e., when Tα is in the same direction as f(x0, y0). It takes on its minimum value, − |f(x0, y0)|, in the opposite direction, when θ = π (Fig. 6.5).

Image

FIGURE 6.5 Directional derivatives in directions parallel to and perpendicular to the gradient vector

Example 6.3

In Example 6.2 above, we have

Image

This is a vector directed toward the origin. The vector Tα = (1/Image )Image −2, 3Image is perpendicular to it, and hence the directional derivative αf(3, 2) = 0 (Fig. 6.6).

Image

FIGURE 6.6 f(3, 2), where Image

Note that in terms of the surface z = (25 − x2y2)1/2, which is a her sphere, the gradient points in the direction of steepest ascent, whereas the veci Tα points in the direction of the level curve x2 + y2 = 13 (Fig. 6.7).

Image

FIGURE 6.7 Directional derivatives for hemisphere

Example 6.4

A road is to be built to the top of a mountain whose equation is

Image

(in miles). The shortest possible road is desired with a maximum grade of 30°. In which direction should the road be built at the point (0, −1)?

Since we want the shortest road to the top, we want the maximum increase in f(x, y) = 1 − x2/2 − y2/3, subject to the restriction. Thus, we would make the grade precisely 30°, and hence choose α so that

Image

But fy = − x, fy = − 2y/3, and ≤ f(0, −1) = Image0, ImageImage; hence,

Image

Thus we proceed in a direction that makes an angle of 60° with the positive or negative x axis (Fig. 6.8a).

Example 6.5

Consider the same problem as Example 6.4, but start at the point (Image , 1).

We have f(Image , 1) = ImageImage , −Image Image. Using Eq. (6.4), we would have to solve

Image

Image

FIGURE 6.8 Finding α so that αf(x0, y0) = Image/3 where f(x, y) = 1 − x2/3 − y2/3

It is easier to apply Eq. (6.8). We have |f(Image , 1)| = Image Image, and we must choose θ so that

Image

Referring to trigonometric tables, we find that θ is approximately 52°. Thus the gradient vector at (Image , 1) points downward and to the left at a 45° angle, and we must proceed in a direction making an angle of 52° with the gradient (Fig. 6.8b).

Remark Equation (6.8) and its consequences listed above are of basic importance in working with functions of more than one variable. They describe how to determine all directional derivatives from a knowledge of the gradient vector. Equally important is the observation that when the above facts are viewed from the opposite direction, they allow us to describe the gradient vector in terms of the totality of directional derivatives.

Theorem 6.2 If f(x, y) is continuously differentiable in a domain D, then the gradient vector of f at a point (x0, y0) in D is the vector whose magnitude equals the maximum directional derivative of f at (x0, y0) and whose direction is the direction in which this maximum occurs.

NOTE This description of the gradient vector may be expressed more compactly as

Image

where

Image

PROOF. (See also Ex. 6.8 below.) This is essentially a restatement of the facts listed after Eq. (6.8). If all directional derivatives at (x0, y0) are zero, then the gradient of f is the zero vector. If some directional derivative is different from zero, then by Eq. (6.8)

Image

Furthermore, there is a unique angle α0 such that α0f(x0, y0) = |f(x0, y0)|; namely, α0 must be chosen so that cos θ = 1 in Eq. (6.8). In other words, the angle θ between f(x0, y0) and Tα0 must be zero. This means that Tα0 is the unit vector in the direction of f(x0, y0), or that

Image

This proves the theorem.

Image

We conclude this section with a brief discussion of maxima and minima for functions of two variables. We shall return to this subject in Sect. 12.

Definition 6.3 A function f(x, y) defined in a domain D is said to have a local maximum (or relative maximum) at a point (x0, y0) in D if there is some circle about (x0, y0) such that f(x, y) ≤ f(x0, y0) for (x, y) inside that circle. Similarly for a local minimum (or relative minimum).

Definition 6.4 A function f(x, y) defined in a domain D is said to have an absolute maximum (or to attain its maximum) at a point (x0, y0) in D if f(x, y) ≤ f(x0, y0) for all points (x, y) in D. Similarly, for an absolute minimum.

Note that a local maximum of f(x, y) may correspond to a point on the surface z = f(x, y) that looks like a mountain peak, a point on a ridge, or a point on a plateau (Fig. 6.9).

Image

FIGURE 6.9 Different types of local maxima

Lemma 6.1 If f(x, y) has a local maximum or minimum at a point (x0, y0) in a domain D, and if the gradient of f exists at (x0, y0), then

Image

PROOF. If f(x, y) has a local maximum at (x0, y0), then the function f(x, y0) of the single variable x has a local maximum at x = x0, and hence if its derivative exists, it must vanish; that is, fx(x0, y0) = 0. Similarly, f(x0, y) has a local maximum at y = y0, and hence fy(x0, y0) = 0. Thus Eq. (6.9) holds. The same argument applies if f(x, y) has a local minimum at (x0, y0).

Image

Example 6.6

At which points does the function

Image

have a local maximum or minimum?

We have in this case

Image

Hence

Image

Thus the gradient exists for all x, y, and it vanishes only at the point (−1, 2). This point is therefore the only possible local maximum or minimum.

We may note that in this case we can determine purely algebraically the behavior of f(x, y) near (−1, 2). Namely, by completing the squares:

Image

we may write

Image

and this function clearly has a local minimum (in fact an absolute minimum) at (−1, 2).

Example 6.7

At which points does the function f(x, y) = cos (xey) have a local maximum or minimum ?

Here we have

Image

Since ey never vanishes, we have

Image

where n is an arbitrary integer. Thus the gradient can only vanish at points where xey = , and it does in fact vanish at all these points, since fy vanishes there too. When n = 0, we obtain the entire y axis or x = 0. For every other integer n we obtain a curve x = nπey. Each of these curves corresponds to a “ridge” on the surface z = f(x, y). Note that along each curve xey = ,

Image

the sign being positive or negative depending on whether n is even or odd. Since |f(x, y)| ≤ 1 for all x, y, it follows that for n even, all points on the curve x = nπey are local maxima; while for n odd, all points are local minima. (See the sketch of this surface in Fig. 3.7.)

In both of these examples we could tell whether a given point where the gradient vanished was a local maximum or minimum by using special properties of the functions considered. In general it may be difficult to decide if a point where the gradient vanishes is a local maximum, a local minimum, or neither. One method for doing this will be given in Th. 12.2.

Exercises

6.1 For each of the following functions, find the gradient vector at the points indicated.

a. f(x, y) = 3x − 7y + 2 at (1, −5)

b. f(x, y) = 5x3 − 6x2y + 7y2 at (0, 0)

c. f(x, y) = 5x3 − 6x2y + 7y2 + 3x − 7y + 2 at (0, 0)

d. f(x, y) = x5 + esin x at (0, π)

6.2 For each of the following functions f(x, y) find the directional derivative αf at the point indicated and in the direction prescribed.

a. f(x, y) = sinh xy at (2, 0); α = Imageπ

b. f(x, y) = arc tan (y/x) at (1, 2); α = Imageπ

c. f(x, y) = (x2y2)/(x2 + y2) at (1, 1); α = Imageπ

d. f(x, y) = (x2 + y2)2 at (Image , −Image ); α = Imageπ

6.3 For each of the following functions, find the gradient at the point (1, 1), and then compute the directional derivatives at that point in each of the directions Image

a. f(x, y) = x2 + (y − 1)2

b. f(x, y) = (1/x) log (1/y)

c. f(x, y) = log (x2 + y2)

d. f(x, y) = ey/x

6.4 For each of the following functions, find the maximal directional derivative at the point indicated, and find the direction for which this maximum is attained.

a. f(x, y) = x sin y at (1, 0)

b. f(x, y) = x sin y at (1, Imageπ)

c. Image

d. f(x, y) = log (y/x) at (3, 4)

6.5 Let F(t) be a continuously differentiable function of t, F'(t) ≠ 0, and let f(x, y) = F(x2 + y2). Find f at any arbitrary point, and show that it is directed along the line joining that point to the origin.

6.6 Let G(t) be a continuously differentiable function of t, G'(t) ≠ 0, and let g(x, y) = G(y/x). Find g at any point where x ≠ 0 and show that it is perpendicular to the line joining that point to the origin.

6.7 Let f(x, y) and g(x, y) be continuously differentiable in a domain D, and suppose that they satisfy the equations fx = gy, fy = − gx at each point of D. (These equations are called the Cauchy-Riemann equations. Examples of such pairs of functions are given in Ex. 4.4.) Show that at any point (x0 y0) of D, the equation

Image

holds for all α.

6.8 Let g(α) = A cos α + B sin α, where A and B are constants, not both zero.

a. Show that the derivative g'(α) vanishes for those values of α for which

Image

b. Show that the maximum value of g(α) is Image and is attained for the unique value of α satisfying

Image

c. Use parts a and b in conjunction with Th. 6.1 to give an alternative proof of Th. 6.2.

6.9 Let f(x, y) be continuously differentiable, and suppose that the maximal directional derivative of f at the origin is equal to 5 and is attained in the direction of the vector from the origin toward the point (−1, 2). Find f(0, 0).

6.10 Let f(x, y) be continuously differentiable, and let z = L(x, y) be the equation of the tangent plane to the surface z = f(x, y) at (x0 y0). Show that

Image

for all α.

6.11 If f(x,y) is continuously differentiable at (x0, y0), show that for any α and β the following are true.

a. α +β f(x0, y0) = cos β α f(x0, y0) + sin β α + π/2 f(x0, y0)

b. Image

6.12 Using Lemma 6.1, determine the points where the following functions may have a maximum or minimum. Then try to verify directly whether the points in question are local maxima or minima.

a. f(x, y) = 1 − x2 − 2y2 + x + 4y

b. f(x, y) = (cos xy)2

c. f(x, y) = log log (x2 + y2 + 2)

d. f(x, y) = e1−x2y2

6.13 Let f(x, y) be a general quadratic polynomial

Image

Show that if f(x, y) has a local maximum or minimum at a point, then the coordinates (x, y) of that point must satisfy the simultaneous linear equations

Image

6.14 Applying Ex. 6.13 to each of the following quadratic polynomials, find the point or points where a maximum or minimum may occur.

a. x2 + 4xy + 5y2 − 2y + 1

b. 8x2 + 2xyy2 − 14x + 5y

c. 4x2 − 12xy + 9y2 + 4x − 6y + 3

d. − 5x2 + 6xy − 2y2 − 10x + 6y + 2

6.15 a. Show that if the coefficients of f(x, y) in Ex. 6.13 satisfy the condition ACB2 ≠ 0, then there is at most one point at which f(x, y) can have a local maximum or minimum.

b. Find the coordinates of that point in terms of the coefficients of f(x, y).

c. Test the functions in Ex. 6.14 to see which satisfy the condition of part a. (Note that the coefficient of the xy term was denoted by 2B rather than B.)

d. Show that the function f(x, y) = (ax + by + c)2 does not satisfy the condition of part a, and that this function has a local minimum at more than one point.

6.16 A situation that arises in many different contexts is the following. Two observable quantities x and y are known to be connected approximately by a linear relation of the form y = ax + b. The values of the coefficients a and bare not known, and the problem is to determine them by a number of observations. Using different values x1, …, xn, one obtains a series of values y1, …, yn. If there exist constants a, b such that yk = axk + b holds exactly for each k = 1, …, n, the problem is solved. In general, for any choice of a and b there are certain “residuals” rk = axk + byk. The problem is to find the most favorable choice of a and b ; geometrically, the problem is to find the straight line that “most nearly” passes through the points (xn, yn). From many points of view the optimal choice is that provided by the method of least squares. This method consists of choosing a and b by minimizing the sums of the squares of the residuals.

a. Let

Image

Show that f(a, b) is a quadratic polynomial in the unknowns a and b.

b. Use Ex. 6.13 to show that the values of a and b obtained by the method of least squares must satisfy a pair of simultaneous linear equations whose coefficients depend on x1, …, xn and y1, …, yn.

c. Show that if a and b are chosen by the method of least squares, then the positive and negative residuals cancel out in the sense that the sum of the residuals equals zero. (Hint: apply Lemma 6.1 to the function f(a, b).)

6.17 Apply the method of least squares to find the best linear relation y = ax + b corresponding to the pairs of observed values (x1, y1) = (1, 5), (x2, y2) = (2, 6), (x3, y3) = (3, 8). After finding a and b, compute the residuals and show that they satisfy the condition in Ex. 6.16c.

6.18 Let (x1, y1),…, (xn, yn) be any n fixed points in the plane. For any point (x, y), let

Image

be the sum of the squares of the distances from (x, y) to the given points. Show that if f(x, y) has a minimum at (x0, y0), then

Image

6.19 In most problems encountered in applications it is impossible to find the maximum or minimum of a function f(x, y) by purely analytic methods. One of the numerical procedures that may be used is the gradient method or method of steepest descent. This is based on the fact that the function increases most rapidly in the direction of the gradient vector. Suppose for example, that we are trying to minimize the function f(x, y). Starting at any point (x1, y1), choose a positive constant h and let

Image

Since the displacement vector

Image

is in the opposite direction from the gradient vector, the function f(x, y) will have a smaller value at (x2, y2) than at (x1, y1), provided that h is sufficiently small. Repeating the process, let

Image

By continuing in this manner, we obtain a sequence of points (xk, yk), and it can be shown in many cases that when h is sufficiently small, this sequence will converge to a point where f(x, y) has a minimum. (See Sects. 4.41 and 6.45 of [34] for further discussion, and for variations on this method.)

Carry out several steps of this procedure for the function

Image

starting at the origin (x1, y1) = (0, 0), and using the value h = Image. Note how the points obtained gradually approach the actual minimum, which can be obtained analytically in this case by the method of Ex. 6.13.