Two-Dimensional Calculus (2011)
Chapter 2. Differentiation
6. Directional derivatives and gradient
The partial derivatives fx and fy at a point measure the rate of change of the function f in the x and y directions, respectively. They may be considered to be special cases of the directional derivative, which measures the rate of change of / in an arbitrary direction, and which we define below.
We use the following notation throughout this section. Let
be the unit vector making an angle α with the positive x direction.
Given any point (x0, y0), we wish to measure the rate of change of f in the direction of Tα. We simply take the change in the function as we move a distance Δs in this direction, divide by Δs, and take the limit as Δs tends to zero (Fig. 6.1). We use the notation ∇αf(x0, y0) for this quantity.
FIGURE 6.1 Directional derivative ∇α
Definition 6.1 The directional derivative of a function f(x, y) at a point (x0, y0) in the direction of the vector cos α, sin α is
Note that the partial derivatives fx and fy are special cases:
In general, a function may have directional derivatives in some directions, but not in others. Again we may refer to the example defined by Eq. (5.1). For that function, the directional derivatives at the origin exist in the x and ydirections, but in no other direction. This illustrates again the inadequacy of partial derivatives for describing the way a function varies in an arbitrary direction. The surprising fact is that if we add to the assumption that the partial derivatives exist at each point the further assumption that they are continuous, then we are able to draw two conclusions.
1. The directional derivatives must exist in all directions.
2. The value of the directional derivative at any point in a given direction depends only on the direction and on the partial derivatives fx, fy at the point.
The proof of these facts is based on the fundamental Lemma 5.1. We have the following theorem.
Theorem 6.1 Let f(x, y) be continuously differentiable in a domain D. Then at any point (x0, y0), the directional derivative ∇αf(x0, y0) exists for all α and is given by
PROOF. Setting x = x0 + Δs cos α, y = y0 + Δs sin α in Eq. (5.3), we find
Dividing by Δs yields
But and the fundamental Lemma 5.1 asserts precisely that
Thus, applying Def. 6.1 to Eq. (6.5) we obtain Eq. (6.4).
Example 6.1
Let
and
then
Note that the fact that the directional derivative is negative means that the function is decreasing in this direction (Fig. 6.2a)
FIGURE 6.2 Directions of derivatives in Examples 6.1 and 6.2
Example 6.2
Using the same function and same point as in Example 6.1, let
Note that in this case Tα is a unit vector in a direction perpendicular to the radius vector 3, 2 . The rate of change of the function in this direction is zero (Fig. 6.2b).
There are two physical interpretations that are most helpful in understanding the directional derivative. The first is obtained by picturing the domain D to be a thin plate and the function f(x, y) the temperature at each point of the plate. Then the directional derivative at any point tells us whether, as we move in a given direction, we are getting hotter or colder, and how fast.
The other interpretation concerns the surface z = f(x, y). We draw a line through (x0, y0) in the direction of Tα and consider the vertical plane through this line. The intersection of that plane with the surface defines a curve whose slope is precisely ∇αf(x0, y0) (Fig. 6.3). If we return to our description of the surface as a mountainous terrain, where f(x, y) is the altitude, then the directional derivative is positive or negative according to whether we are ascending or descending when walking in the given direction. The magnitude of the directional derivative simply measures how steep a path would be in that direction.
FIGURE 6.3 Geometric interpretation of the directional derivative
We return now to Eq. (6.4), and we observe that the right-hand side has the form of a dot product of two vectors, one of them being Tα. The other one would have components fx, fy at the point. This vector is of basic importance for functions of two variables. It is called the gradient vector of f and is denoted by ∇f.
Definition 6.2 Let f(x, y) be defined in a domain D. At any point (x0, y0) where the partial derivatives fx and fy exist, we define the gradient of f to be the vector whose components are these partial derivatives :
Remark on Notation One often sees the notation grad f for the gradient, or grad f where the boldface type indicates a vector quantity. The upside-down delta used in Eq. (6.6) is called a nabla, but when used to operate on a function, as in Eq. (6.6), it is usually referred to as del. Thus one reads “∇f” as “del f.” The boldface ∇ is used to emphasize the fact that the gradient of a function defines a vector at each point. In the notation ∇αf, we do not use boldface, since the directional derivative is a number, rather than a vector.
We may now rewrite Eq. (6.4) in vector notation, using Eqs. (6.1) and (6.6):
This is at first merely a different notation for the same thing. However, it leads to important new information. Recalling that the dot product of two vectors is the product of their lengths times the cosine of the angle in between and observing that Tα is a vector of length 1, we have
where θ is the angle between ∇f(x0, y0) and Tα (Fig. 6.4).
FIGURE 6.4 Gradient vector and directional derivative
From Eq. (6.8) we deduce immediately the following basic facts.
Let f(x, y) be continuously differentiable in D. Then at each point (x0, y0) of D we have a fixed vector
1. If |∇f(x0, y0)| = 0 (i.e., if both partial derivatives at the point are zero), then all directional derivatives at the point are zero.
2. If |∇f(x0, y0)| ≠ 0, then
⇔ Tα is one of the two directions perpendicular to ∇f(x0, y0) (Fig. 6.5).
3. Since |cos θ| ≤ 1, the value of the directional derivative always varies between ± |∇f(x0, y0)|. If |∇f(x0, y0)| ≤ 0, then as α varies the directional derivative takes on its maximum value, |∇f(x0, y0)|, precisely when θ = 0, i.e., when Tα is in the same direction as ∇f(x0, y0). It takes on its minimum value, − |∇f(x0, y0)|, in the opposite direction, when θ = π (Fig. 6.5).
FIGURE 6.5 Directional derivatives in directions parallel to and perpendicular to the gradient vector
Example 6.3
In Example 6.2 above, we have
This is a vector directed toward the origin. The vector Tα = (1/ ) −2, 3 is perpendicular to it, and hence the directional derivative ∇αf(3, 2) = 0 (Fig. 6.6).
FIGURE 6.6 ∇f(3, 2), where
Note that in terms of the surface z = (25 − x2 − y2)1/2, which is a her sphere, the gradient points in the direction of steepest ascent, whereas the veci Tα points in the direction of the level curve x2 + y2 = 13 (Fig. 6.7).
FIGURE 6.7 Directional derivatives for hemisphere
Example 6.4
A road is to be built to the top of a mountain whose equation is
(in miles). The shortest possible road is desired with a maximum grade of 30°. In which direction should the road be built at the point (0, −1)?
Since we want the shortest road to the top, we want the maximum increase in f(x, y) = 1 − x2/2 − y2/3, subject to the restriction. Thus, we would make the grade precisely 30°, and hence choose α so that
But fy = − x, fy = − 2y/3, and ≤ f(0, −1) = 0, ; hence,
Thus we proceed in a direction that makes an angle of 60° with the positive or negative x axis (Fig. 6.8a).
Example 6.5
Consider the same problem as Example 6.4, but start at the point ( , 1).
We have ∇f( , 1) = − , − . Using Eq. (6.4), we would have to solve
FIGURE 6.8 Finding α so that ∇αf(x0, y0) = /3 where f(x, y) = 1 − x2/3 − y2/3
It is easier to apply Eq. (6.8). We have |∇f( , 1)| = , and we must choose θ so that
Referring to trigonometric tables, we find that θ is approximately 52°. Thus the gradient vector at ( , 1) points downward and to the left at a 45° angle, and we must proceed in a direction making an angle of 52° with the gradient (Fig. 6.8b).
Remark Equation (6.8) and its consequences listed above are of basic importance in working with functions of more than one variable. They describe how to determine all directional derivatives from a knowledge of the gradient vector. Equally important is the observation that when the above facts are viewed from the opposite direction, they allow us to describe the gradient vector in terms of the totality of directional derivatives.
Theorem 6.2 If f(x, y) is continuously differentiable in a domain D, then the gradient vector of f at a point (x0, y0) in D is the vector whose magnitude equals the maximum directional derivative of f at (x0, y0) and whose direction is the direction in which this maximum occurs.
NOTE This description of the gradient vector may be expressed more compactly as
where
PROOF. (See also Ex. 6.8 below.) This is essentially a restatement of the facts listed after Eq. (6.8). If all directional derivatives at (x0, y0) are zero, then the gradient of f is the zero vector. If some directional derivative is different from zero, then by Eq. (6.8)
Furthermore, there is a unique angle α0 such that ∇α0f(x0, y0) = |∇f(x0, y0)|; namely, α0 must be chosen so that cos θ = 1 in Eq. (6.8). In other words, the angle θ between ∇f(x0, y0) and Tα0 must be zero. This means that Tα0 is the unit vector in the direction of ∇f(x0, y0), or that
This proves the theorem.
We conclude this section with a brief discussion of maxima and minima for functions of two variables. We shall return to this subject in Sect. 12.
Definition 6.3 A function f(x, y) defined in a domain D is said to have a local maximum (or relative maximum) at a point (x0, y0) in D if there is some circle about (x0, y0) such that f(x, y) ≤ f(x0, y0) for (x, y) inside that circle. Similarly for a local minimum (or relative minimum).
Definition 6.4 A function f(x, y) defined in a domain D is said to have an absolute maximum (or to attain its maximum) at a point (x0, y0) in D if f(x, y) ≤ f(x0, y0) for all points (x, y) in D. Similarly, for an absolute minimum.
Note that a local maximum of f(x, y) may correspond to a point on the surface z = f(x, y) that looks like a mountain peak, a point on a ridge, or a point on a plateau (Fig. 6.9).
FIGURE 6.9 Different types of local maxima
Lemma 6.1 If f(x, y) has a local maximum or minimum at a point (x0, y0) in a domain D, and if the gradient of f exists at (x0, y0), then
PROOF. If f(x, y) has a local maximum at (x0, y0), then the function f(x, y0) of the single variable x has a local maximum at x = x0, and hence if its derivative exists, it must vanish; that is, fx(x0, y0) = 0. Similarly, f(x0, y) has a local maximum at y = y0, and hence fy(x0, y0) = 0. Thus Eq. (6.9) holds. The same argument applies if f(x, y) has a local minimum at (x0, y0).
Example 6.6
At which points does the function
have a local maximum or minimum?
We have in this case
Hence
Thus the gradient exists for all x, y, and it vanishes only at the point (−1, 2). This point is therefore the only possible local maximum or minimum.
We may note that in this case we can determine purely algebraically the behavior of f(x, y) near (−1, 2). Namely, by completing the squares:
we may write
and this function clearly has a local minimum (in fact an absolute minimum) at (−1, 2).
Example 6.7
At which points does the function f(x, y) = cos (xey) have a local maximum or minimum ?
Here we have
Since ey never vanishes, we have
where n is an arbitrary integer. Thus the gradient can only vanish at points where xey = nπ, and it does in fact vanish at all these points, since fy vanishes there too. When n = 0, we obtain the entire y axis or x = 0. For every other integer n we obtain a curve x = nπe−y. Each of these curves corresponds to a “ridge” on the surface z = f(x, y). Note that along each curve xey = nπ,
the sign being positive or negative depending on whether n is even or odd. Since |f(x, y)| ≤ 1 for all x, y, it follows that for n even, all points on the curve x = nπe−y are local maxima; while for n odd, all points are local minima. (See the sketch of this surface in Fig. 3.7.)
In both of these examples we could tell whether a given point where the gradient vanished was a local maximum or minimum by using special properties of the functions considered. In general it may be difficult to decide if a point where the gradient vanishes is a local maximum, a local minimum, or neither. One method for doing this will be given in Th. 12.2.
Exercises
6.1 For each of the following functions, find the gradient vector at the points indicated.
a. f(x, y) = 3x − 7y + 2 at (1, −5)
b. f(x, y) = 5x3 − 6x2y + 7y2 at (0, 0)
c. f(x, y) = 5x3 − 6x2y + 7y2 + 3x − 7y + 2 at (0, 0)
d. f(x, y) = x5 + esin x at (0, π)
6.2 For each of the following functions f(x, y) find the directional derivative ∇αf at the point indicated and in the direction prescribed.
a. f(x, y) = sinh xy at (2, 0); α = π
b. f(x, y) = arc tan (y/x) at (1, 2); α = π
c. f(x, y) = (x2 − y2)/(x2 + y2) at (1, 1); α = π
d. f(x, y) = (x2 + y2)2 at ( , − ); α = π
6.3 For each of the following functions, find the gradient at the point (1, 1), and then compute the directional derivatives at that point in each of the directions
a. f(x, y) = x2 + (y − 1)2
b. f(x, y) = (1/x) log (1/y)
c. f(x, y) = log (x2 + y2)
d. f(x, y) = ey/x
6.4 For each of the following functions, find the maximal directional derivative at the point indicated, and find the direction for which this maximum is attained.
a. f(x, y) = x sin y at (1, 0)
b. f(x, y) = x sin y at (1, π)
c.
d. f(x, y) = log (y/x) at (3, 4)
6.5 Let F(t) be a continuously differentiable function of t, F'(t) ≠ 0, and let f(x, y) = F(x2 + y2). Find ∇f at any arbitrary point, and show that it is directed along the line joining that point to the origin.
6.6 Let G(t) be a continuously differentiable function of t, G'(t) ≠ 0, and let g(x, y) = G(y/x). Find ∇g at any point where x ≠ 0 and show that it is perpendicular to the line joining that point to the origin.
6.7 Let f(x, y) and g(x, y) be continuously differentiable in a domain D, and suppose that they satisfy the equations fx = gy, fy = − gx at each point of D. (These equations are called the Cauchy-Riemann equations. Examples of such pairs of functions are given in Ex. 4.4.) Show that at any point (x0 y0) of D, the equation
holds for all α.
6.8 Let g(α) = A cos α + B sin α, where A and B are constants, not both zero.
a. Show that the derivative g'(α) vanishes for those values of α for which
b. Show that the maximum value of g(α) is and is attained for the unique value of α satisfying
c. Use parts a and b in conjunction with Th. 6.1 to give an alternative proof of Th. 6.2.
6.9 Let f(x, y) be continuously differentiable, and suppose that the maximal directional derivative of f at the origin is equal to 5 and is attained in the direction of the vector from the origin toward the point (−1, 2). Find ∇f(0, 0).
6.10 Let f(x, y) be continuously differentiable, and let z = L(x, y) be the equation of the tangent plane to the surface z = f(x, y) at (x0 y0). Show that
for all α.
6.11 If f(x,y) is continuously differentiable at (x0, y0), show that for any α and β the following are true.
a. ∇α +β f(x0, y0) = cos β ∇α f(x0, y0) + sin β ∇α + π/2 f(x0, y0)
b.
6.12 Using Lemma 6.1, determine the points where the following functions may have a maximum or minimum. Then try to verify directly whether the points in question are local maxima or minima.
a. f(x, y) = 1 − x2 − 2y2 + x + 4y
b. f(x, y) = (cos x − y)2
c. f(x, y) = log log (x2 + y2 + 2)
d. f(x, y) = e1−x2 − y2
6.13 Let f(x, y) be a general quadratic polynomial
Show that if f(x, y) has a local maximum or minimum at a point, then the coordinates (x, y) of that point must satisfy the simultaneous linear equations
6.14 Applying Ex. 6.13 to each of the following quadratic polynomials, find the point or points where a maximum or minimum may occur.
a. x2 + 4xy + 5y2 − 2y + 1
b. 8x2 + 2xy − y2 − 14x + 5y
c. 4x2 − 12xy + 9y2 + 4x − 6y + 3
d. − 5x2 + 6xy − 2y2 − 10x + 6y + 2
6.15 a. Show that if the coefficients of f(x, y) in Ex. 6.13 satisfy the condition AC − B2 ≠ 0, then there is at most one point at which f(x, y) can have a local maximum or minimum.
b. Find the coordinates of that point in terms of the coefficients of f(x, y).
c. Test the functions in Ex. 6.14 to see which satisfy the condition of part a. (Note that the coefficient of the xy term was denoted by 2B rather than B.)
d. Show that the function f(x, y) = (ax + by + c)2 does not satisfy the condition of part a, and that this function has a local minimum at more than one point.
6.16 A situation that arises in many different contexts is the following. Two observable quantities x and y are known to be connected approximately by a linear relation of the form y = ax + b. The values of the coefficients a and bare not known, and the problem is to determine them by a number of observations. Using different values x1, …, xn, one obtains a series of values y1, …, yn. If there exist constants a, b such that yk = axk + b holds exactly for each k = 1, …, n, the problem is solved. In general, for any choice of a and b there are certain “residuals” rk = axk + b − yk. The problem is to find the most favorable choice of a and b ; geometrically, the problem is to find the straight line that “most nearly” passes through the points (xn, yn). From many points of view the optimal choice is that provided by the method of least squares. This method consists of choosing a and b by minimizing the sums of the squares of the residuals.
a. Let
Show that f(a, b) is a quadratic polynomial in the unknowns a and b.
b. Use Ex. 6.13 to show that the values of a and b obtained by the method of least squares must satisfy a pair of simultaneous linear equations whose coefficients depend on x1, …, xn and y1, …, yn.
c. Show that if a and b are chosen by the method of least squares, then the positive and negative residuals cancel out in the sense that the sum of the residuals equals zero. (Hint: apply Lemma 6.1 to the function f(a, b).)
6.17 Apply the method of least squares to find the best linear relation y = ax + b corresponding to the pairs of observed values (x1, y1) = (1, 5), (x2, y2) = (2, 6), (x3, y3) = (3, 8). After finding a and b, compute the residuals and show that they satisfy the condition in Ex. 6.16c.
6.18 Let (x1, y1),…, (xn, yn) be any n fixed points in the plane. For any point (x, y), let
be the sum of the squares of the distances from (x, y) to the given points. Show that if f(x, y) has a minimum at (x0, y0), then
6.19 In most problems encountered in applications it is impossible to find the maximum or minimum of a function f(x, y) by purely analytic methods. One of the numerical procedures that may be used is the gradient method or method of steepest descent. This is based on the fact that the function increases most rapidly in the direction of the gradient vector. Suppose for example, that we are trying to minimize the function f(x, y). Starting at any point (x1, y1), choose a positive constant h and let
Since the displacement vector
is in the opposite direction from the gradient vector, the function f(x, y) will have a smaller value at (x2, y2) than at (x1, y1), provided that h is sufficiently small. Repeating the process, let
By continuing in this manner, we obtain a sequence of points (xk, yk), and it can be shown in many cases that when h is sufficiently small, this sequence will converge to a point where f(x, y) has a minimum. (See Sects. 4.41 and 6.45 of [34] for further discussion, and for variations on this method.)
Carry out several steps of this procedure for the function
starting at the origin (x1, y1) = (0, 0), and using the value h = . Note how the points obtained gradually approach the actual minimum, which can be obtained analytically in this case by the method of Ex. 6.13.