DIRECTIONAL DERIVATIVES AND THE DIFFERENTIAL - Multivariable Differential Calculus

Advanced Calculus of Several Variables (1973)

Part II. Multivariable Differential Calculus

Chapter 2. DIRECTIONAL DERIVATIVES AND THE DIFFERENTIAL

We have seen that the definition of the derivative of a function of a single variable is motivated by the problem of defining tangent lines to curves. In a similar way the concept of differentiability for functions of several variables is motivated by the problem of defining tangent planes to surfaces.

It is customary to describe the graph in ³ of a function f : ² → as a “surface” lying “over” the xy-plane ². This graph may be regarded as the image of the mapping F : ² → ³ defined by F(x, y) = (x, y, f(x, y)). Generalizing this geometric interpretation, we will (at least in the case m > n) think of the image of a mapping F : ⁿ → ^m as an n-dimensional surface in ^m. So here we are using the word “surface” only in an intuitive way; we defer its precise definition to Section 5.

Figure 2.5

One would naturally expect an n-dimensional surface in ^m (m > n) to have at each point an n-dimensional tangent plane. By an n-dimensional plane in ^m will be meant a parallel translate of an n-dimensional subspace (through the origin) of ^m. If V is a subspace of ^m, by the parallel translate of V through the point (or the parallel translate of V to a) is meant the set of all points such that (Fig. 2.5). If V is the solution set of the linear equation

where A is a matrix and x a column vector, then this parallel translate of V is the solution set of the equation

Given a mapping F : ⁿ → ^m and a point , let us try to define the plane (if any) in ^m that is tangent to the image surface S of F at F(a). The basic idea is that this tangent plane should consist of all straight lines through F(a) which are tangent to curves in the surface S (see Fig. 2.6).

Figure 2.6

Given , we consider, as a fairly typical such curve, the image under F of the straight line in ⁿ which passes through the point a and is parallel to the vector v. So we define γ_v : → ^m by

for each .

We then define the directional derivative with respect to v of F at a to be the velocity vector γ_v′(0), that is,

provided that the limit exists. The vector D_v F(a), translated to F(a), is then a tangent vector to S at F(a) (see Fig. 2.7).

Figure 2.7

For an intuitive interpretation of the directional derivative, consider the following physical example. Suppose f(p) denotes the temperature at the point . If a particle travels along a straight line through p with constant velocity vector v, then D_vf(p) is the rate of change of temperature which the particle is experiencing as it passes through the point p (why?).

For another interpretation, consider the special case f : ² → , and let be a unit vector. Then D_vf(p) is the slope at of the curve in which the surface z = f(x, y) intersects the vertical plane which contains the point and is parallel to the vector v (why?).

Of special interest are the directional derivatives of F with respect to the standard unit basis vectors e₁, . . . , e_n. These are called the partial derivatives of F. The ith partial derivative of F at a, denoted by

is defined by

If a = (a₁, . . . , a_n), we see that

so D_iF(a) is simply the result of differentiating F as a function of the single variable x_i, holding the remaining variables fixed.

Example 1 If f(x, y) = xy, then D₁ f(x, y) = y and D₂ f(x, y) = x. If g(x, y) = e^x sin y, then D₁ g(x, y) = e^x sin y and D₂ g(x, y) = e^x cos y.

To return to the matter of the tangent plane to S at F(a), note that

so D_v F(a) and D_w F(a) are collinear vectors in ^m if v and w are collinear in ⁿ. Thus every straight line through the origin in Rⁿ determines a straight line through the origin in ^m. Obviously we would like the union

of all straight lines in ^m obtained in this manner, to be a subspace of R^m. If this were the case, then the parallel translate of _a to F(a) would be a likely candidate for the tangent plane to S at F(a).

The set is simply the image of ⁿ under the mapping L : ⁿ → ^m defined by

for all . Since the image of a linear mapping is always a subspace, we can therefore ensure that _a is a subspace of ^m by requiring that L be a linear mapping.

We would also like our tangent plane to “closely fit” the surface S near F(a). This means that we want L(v) to be a good approximation to F(a + v) − F(a) when v is small. But we have seen this sort of condition before, namely in Theorem 1.2. The necessary and sufficient condition for differentiability in the case n = 1 now becomes our definition for the general case.

The mapping F, from an open subset D of ⁿ to ^m, is differentiable at the point if and only if there exists a linear mapping L : ⁿ → ^m such that

The linear mapping L is then denoted by dF_a, and is called the differential of F at a. Its matrix F′(a) is called the derivative of F at a. Thus F′(a) is the (unique) m × n matrix, provided by Theorem 4.1 of Chapter I, such that

for all . In Theorem 2.4 below we shall prove that the differential of F at a is well defined by proving that

in terms of the partial derivatives of the coordinate functions F₁, . . . , F_m of F. For then if L₁ and L₂ were two linear mappings both satisfying (4) above, then each would have the same matrix given by (6), so they would in fact be the same linear mapping.

To reiterate, the relationship between the differential dF_a and the derivative F′(a) is the same as in Section 1—the differential dF_a : ⁿ → ^m is a linear mapping represented by the m × n matrix F′(a).

Note that, if we write ΔF_a(h) = F(a + h) − F(a), then (4) takes the form

which says (just as in the case n = 1 of (Section 1) that the difference, between the actual change in the value of F from a to a + h and the approximate change dF_a(h), goes to zero faster than h as h → 0. We indicate this by writing ΔF_a(h) ≈ dF_a(h), or F(a + h) ≈ F(a) + dF_a(h). We will see presently that dF_a(h) is quite easy to compute (if we know the partial derivatives of F at a), so this gives an approximation to the actual value F(a + h) if h is small. However we will not be able to say how small h need be, or to estimate the “error” ΔF_a(h) − dF_a(h) made in replacing the actual value F(a + h) by the approximation F(a) + dF_a(h), until the multivariable Taylor's formula is available Section 7). The picture of the graph of F when n = 2 and m = 1 is instructive (Fig. 2.8).

Figure 2.8

Example 2 If F : ⁿ → ^m is constant, that is, there exists such that F(x) = b for all , then F is differentiable everywhere, with dF_a = 0 (so the derivative of a constant is zero as expected), because

Example 3 If F : ⁿ → ^m is linear, then F is differentiable everywhere, and

In short, a linear mapping is its own differential, because

by linearity of F.

For instance, if s : ² → is defined by s(x, y) = x + y, then ds_a = s for all

The following theorem relates the differential to the directional derivatives which motivated its definition.

Theorem 2.1 If F : ⁿ → ^m is differentiable at a, then the directional derivative D_v F(a) exists for all , and

PROOF We substitute h = tv into (4) and let t → 0. Then

so it is clear that

exists and equals dF_a(v).

However the converse of Theorem 2.1 is false. That is, a function may possess directional derivatives in all directions, yet still fail to be differentiable.

Example 4 Let f : ² → be defined by

unless x = y = 0, and f(0, 0) = 0. In Exercise 7.4 of Chapter 1 it was shown that f is not continuous at (0, 0). By Exercise 2.1 below it follows that f is not differentiable at (0, 0). However, if v = (a, b) with b ≠ 0, then

exists, while clearly D_vf(0, 0) = 0 if b = 0. Other examples of nondifferentiable functions that nevertheless possess directional derivatives are given in Exercises 2.3 and 2.4.

The next theorem proceeds a step further, expressing directional derivatives in terms of partial derivatives (which presumably are relatively easy to compute).

Theorem 2.2 If F : ⁿ → ^m is differentiable at a, and v = (v₁, . . . , v_n), then

PROOF

(by Theorem 2.1)

so , applying Theorem 2.1 again.

In the case m = 1 of a differentiable real-valued function f : ⁿ → , the vector

whose components are the partial derivatives of f, is called the gradient vector of f at a. In terms of f(a), Eq. (8) becomes

which is a strikingly simple expression for the directional derivative in terms of partial derivatives.

Example 5 We use Eq. (10) and the approximation Δf_a(h) ≈ df_a(h) to estimate [(13.1)² – (4.9)²]^1/2. Let f(x, y) = (x² − y²)^1/2, a = (13, 5), . Then , so

To investigate the significance of the gradient vector, let us consider a differentiable function f : ⁿ → and a point , where f(a) ≠ 0. Suppose that we want to determine the direction in which f increases most rapidly at a. By a “direction” here we mean a unit vector u. Let θ_u denote the angle between u and f(a). Then (10) gives

But cos θ_u attains its maximum value of + 1 when θ_u = 0, that is, when u and f(a) are collinear and point in the same direction. We conclude that f(a) is the maximum value of D_uf(a) for u a unit vector, and that this maximum value is attained with u = f(a)/f(a).

For example, suppose that f(a) denotes the temperature at the point a. It is a common physical assumption that heat flows in a direction opposite to that of greatest increase of temperature (heat seeks cold). This principle and the above remarks imply that the direction of heat flow at a is given by the vector − f(a).

If f(a) = 0, then a is called a critical point of f. If f is a differentiable real-valued function defined on an open set D in ⁿ, and f attains a local maximum (or local minimum) at the point , then it follows that a must be a critical point of f. For the function g_i(x) = f(a₁, . . . , a_i₋₁, x, a_i₊₁, . . . , a_n) is defined on an open interval of containing a_i, and has a local maximum (or local minimum) at a_i, so D_if(a) = g_i′(a_i) = 0 by the familiar result from elementary calculus. Later in this chapter we will discuss multivariable maximum-minimum problems in considerable detail.

Equation (10) can be rewritten as a multivariable version of the equation df = (df/dx) dx of Section 1. Let x¹, . . . , xⁿ be the coordinate functions of the identity mapping of ⁿ, that is, xⁱ: ⁿ → is defined by xⁱ(p₁, . . . , p_n) = p_i, i = 1, . . . , n. Then xⁱ is a linear function, so

for all , by Example 3. If f: ⁿ → is differentiable at a, then Theorem 2.1 and Eq. (10) therefore give

so the linear functions df_a and are equal. If we delete the subscript a, and write ∂f/∂xⁱ for D_i f(a), we obtain the classical formula

The mapping a → df_a, which associates with each point the linear function df_a : ⁿ → , is called a differential form, and Eq. (11) is the historical reason for this terminology. In Chapter V we shall discuss differential forms in detail.

We now apply Theorem 2.2 to finish the computation of the derivative matrix F′(a). First we need the following lemma on “componentwise differentiation.”

Lemma 2.3 The mapping F: ⁿ → ^m is differentiable at a if and only if each of its component functions F¹, . . . , F^m is, and

(Here we have labeled the component functions with superscripts, rather than subscripts as usual, merely to avoid double subscripts.)

This lemma follows immediately from a componentwise reading of the vector equation (4).

Theorem 2.4 If F: ⁿ → ^m is differentiable at a, then the matrix F′(a) of dF_a is

[That is, D_j Fⁱ(a) is the element in the ith row and jth column of F′(a).]

PROOF

(by Lemma 2.3)

(by Theorem 2.2);

by the definition of matrix multiplication.

Finally we formulate a sufficient condition for differentiability. The mapping F: ⁿ → ^m is said to be continuously differentiable at a if the partial derivatives D₁ F, . . . , D_n F all exist at each point of some open set containing a, and are continuous at a.

Theorem 2.5 If F is continuously differentiable at a, then F is differentiable at a.

PROOF By (see Lemma 2.3), it suffices to consider a continuously differentiable real-valued function f: ⁿ → . Given h = (h₁, . . . , h_n), let h₀ = 0, h_i = (h₁, . . . , h_i, 0, . . . , 0), i = 1, . . . , n (Fig. 2.9). Then

Figure 2.9

The single-variable mean value theorem gives

for some , since D_if is the derivative of the function

Thus f(a + h_i) − f(a + h_i₋₁) = h_i D_if(b_i) for some point b_i which approaches a as h → 0. Consequently

as desired, since each b_i → a as h → 0, and each D_if is continuous at a.

Let us now summarize what has thus far been said about differentiability for functions of several variables, and in particular point out that the rather complicated concept of differentiability, as defined by Eq. (4), has now been justified.

For the importance of directional derivatives (rates of change) is obvious enough and, if a mapping is differentiable, then Theorem 2.2 gives a pleasant expression for its directional derivatives in terms of its partial derivatives, which are comparatively easy to compute; Theorem 2.4 similarly describes the derivative matrix. Finally Theorem 2.5 provides an effective test for the differentiability of a function in terms of its partial derivatives, thereby eliminating (in most cases) the necessity of verifying that it satisfies the definition of differentiability. In short, every continuously differentiable function is differentiable, and every differentiable function has directional derivatives; in general, neither of these implications may be reversed (see Example 4 and Exercise 2.5).

We began this section with a general discussion of tangent planes, which served to motivate the definition of differentiability. It is appropriate to conclude with an example in which our results are applied to actually compute a tangent plane.

Example 6 Let F: ² → ⁴ be defined by

Then F is obviously continuously differentiable, and therefore differentiable (Theorem 2.5). Let a = (1, 2), and suppose we want to determine the tangent plane to the image S of F at the point F(a) = (2, 1, 2, 3). By Theorem 2.4, the matrix of the linear mapping dF_a : ² → ⁴ is the 4 × 2 matrix

The image _a of dF_a is that subspace of ⁴ which is generated by the column vectors b₁ = (0, 1, 2, −2) and b₂ = (1, 0, 1, 4) of F′(a) (see Theorem I.5.2). Since b₁ and b₂ are linearly independent, _a is 2-dimensional, and so is its orthogonal complement (Theorem I.3.4). In order to write _a in the form Ax = 0, we therefore need to find two linearly independent vectors a₁ and a₂ which are orthogonal to both b₁ and b₂; they will then be the row vectors of the matrix A. Two such vectors a₁ and a₂ are easily found by solving the equations

for example, a₁ = (5, 0, −1, −1) and a₂ = (0, 10, −4, 1).

The desired tangent plane T to S at the point F(a) = (2, 1, 2, 3) is now the parallel translate of _a to F(a). That is, T is the set of all points such that A(x − F(a)) = 0,

Upon simplification, we obtain the two equations

The solution set of each of these equations is a 3-dimensional hyperplane in ⁴; the intersection of these two hyperplanes is the desired (2-dimensional) tangent plane T.

Exercises

2.1If F: ⁿ → ^m is differentiable at a, show that F is continuous at a. Hint: Let

Then

2.2If p : ² → is defined by p(x, y) = xy, show that p is differentiable everywhere with dp_{(a, b)} (x, y) = bx + ay. Hint: Let L(x, y) = bx + ay, a = (a, b), h = (h, k). Then show that p(a + h) −p(a) − L(h) = hk. But because .

2.3If f: ² → is defined by f(x, y) = xy²/(x² + y²) unless x = y = 0, and f(0, 0) = 0, show that D_vf(0, 0) exists for all v, but f is not differentiable at (0, 0). Hint: Note first that f(tv) = tf(v) for all and . Then show that D_vf(0, 0) = f(v) for all v. Hence D₁f(0, 0) = D₂f(0, 0) = 0 but .

2.4Do the same as in the previous problem with the function f: ² → defined by f(x, y) = (x^1/3 + y^1/3)³.

2.5Let f : ² → be defined by f(x, y) = x³ sin (1/x) + y² for x ≠ 0, and f(0, y) = y².

(a)Show that f is continuous at (0, 0).

(b) Find the partial derivatives of f at (0, 0).

(d) Show that D₁ f is not continuous at (0, 0).

2.6Use the approximation f_a ≈ df_a to estimate the value of

(a)[(3.02)² + (1.97)² + (5.98)²],

(b).

2.7As in Exercise 1.3, a potential function for the vector field F : ⁿ → ⁿ is a differentiable function V : ⁿ → such that F = −V. Find a potential function for the vector field F defined for all x ≠ 0 by the formula

(a)F(x) = rⁿx, where r = x. Treat separately the cases n = 2 and n ≠ 2.

(b)F(x) = [g′(r)/r]x, where g is a differentiable function of one variable.

2.8Let f : ⁿ → be differentiable. If f(0) = 0 and f(tx) = tf(x) for all and , prove that f(x) = f(0)·x for all . In particular f is linear. Consequently any homogeneous function g : ⁿ → [meaning that g(tx) = tg(x)], which is not linear, must fail to be differentiable at the origin, although it has directional derivatives there (why?).

2.9If f : ⁿ → ^m and g : ⁿ → ^k are both differentiable at , prove directly from the definition that the mapping h : ⁿ → ^m+k, defined by h(x) = (f(x), g(x)), is differentiable at a.

2.10Let the mapping F : ² → ² be defined by F(x₁, x₂) = (sin(x₁ − x₂), cos(x₁ + x₂)). Find the linear equations of the tangent plane in ⁴ to the graph of F at the point (π/4, π/4, 0, 0).

2.11Let f : ²_uv → ³_xyz be the differentiable mapping defined by

Let and . Given a unit vector u = (u, v), let _u : → ² be the straight line through p, and ψ_u : → ³ the curve through q, defined by _u(t) = p + tu, ψ_u(t) = f(_u(t)), respectively. Then

by the definition of the directional derivative.

(a)For what unit vector(s) u is the speed ψ_u′(0) maximal?

(b)Suppose that g:³ → is a differentiable function such that g(q) = (1, 1, −1), and define

Assuming the chain rule result that

find the unit vector u that maximizes h_u′(0).

(c)Write the equation of the tangent plane to the image surface of f at the point f(1, 1) = (1, 0, 2).