THE CHAIN RULE - Multivariable Differential Calculus - Advanced Calculus of Several Variables

Advanced Calculus of Several Variables (1973)

Part II. Multivariable Differential Calculus

Chapter 3. THE CHAIN RULE

Consider the composition H = G Image F of two differentiable mappings F : ImagenImagem and G : ImagemImagek. For example, Image might be the price vector of m intermediate products that are manufactured at a factory from nraw materials whose cost vector is x (that is, the components of Image are the prices of the n raw materials), and H(x) = G(F(x)) the resulting price vector of k final products that are manufactured at a second factory from the mintermediate products. We might wish to estimate the change ΔHa(h) = H(a + h) − H(a) in the prices of the final products, resulting from a change from a to a + h in the costs of the raw materials. Using the approximations ΔF ≈ dFand ΔG ≈ dG, without initially worrying about the accuracy of our estimates, we obtain

Image

This heuristic “argument” suggests the possibility that the multivariable chain rule takes the form dHa = dGF(a) Image dFa, analogous to the restatement in Section 1 of the familiar single-variable chain rule.

Theorem 3.1(The Chain Rule) Let U and V be open subsets of Imagen and Imagem respectively. If the mappings F : UImagem and G : VImagek are differentiable at Image and Image respectively, then their composition H = G Image Fis differentiable at a, and

Image

In terms of derivatives, we therefore have

Image

In brief, the differential of the composition is the composition of the differentials; the derivative of the composition is the product of the derivatives.

PROOF We must show that

Image

If we define

Image

and

Image

then the fact that F and G are differentiable at a and F(a), respectively, implies that

Image

Then

Image

by (Eq. (4) with k = F(a + h) − F(a). Using (3) we then obtain

Image

Therefore

Image

But limh→0 dGF(a)(Image(h)) = 0 because limh→0 Image(h) = 0 and the linear mapping dGF(a) is continuous. Also limh→0 ψ(F(a + h) − F(a)) = 0 because F is continuous at a and limk→0 ψ(k) = 0. Finally the number ImagedFa(h/ImagehImage) + Image(h)Imageremains bounded, because Image and the component functions of the linear mapping dFa are continuous and therefore bounded on the unit sphere Image (Theorem I.8.8).

Consequently the limit of (5) is zero as desired. Of course Eq. (2) follows immediately from (1), since the matrix of the composition of two linear mappings is the product of their matrices.

Image

We list in the following examples some typical chain rule formulas obtained by equating components of the matrix equation (2) for various values of n, m, and k. It is the formulation of the chain rule in terms of differential linear mappings which enables us to give a single proof for all of these formulas, despite their wide variety.

Example 1 If n = k = 1, so we have differentiable mappings Image, then h = g Image f : ImageImage is differentiable with

Image

Here g′(f(t)) is a 1 × m row matrix, and f′(t) is an m × 1 column matrix. In terms of the gradient of g, we have

Image

This is a generalization of the fact that Dvg(a) = Imageg(a) · v ([see Eq. (10) of Section 2]. If we think of f(t) as the position vector of a particle moving in Imagem, with g a temperature function on Imagem, then 6) gives the rate of change of the temperature of the particle. In particular, we see that this rate of change depends only upon the velocity vector of the particle.

In terms of the component functions f1, . . . , fm of f and the partial derivatives of g, (6) becomes

Image

If we write xi = fi(t) and u = g(x), following the common practice of using the symbol for a typical value of a function to denote the function itself, then the above equation takes the easily remembered form

Image

Example 2 Given differentiable mappings Image with composition H : Image2Image2, the chain rule gives

Image

If we write F(s, t) = (x, y, z) and G(x, y, z) = (u, v), this equation can be rewritten

Image

For example, we have

Image

Writing

Image

to go all the way with variables representing functions, we obtain formulas such as

Image

and

Image

The obvious nature of the formal pattern of chain rule formulas expressed in terms of variables, as above, often compensates for their disadvantage of not containing explicit reference to the points at which the various derivatives are evaluated.

Example 3 Let T : Image2Image2 be the familiar “polar coordinate mapping” defined by T(r, θ) = (r cos θ, r sin θ) (Fig. 2.10). Given a differentiable function f : Image2Image, define g = f Image T, so g(r, θ) = f(r cos θ, r sin θ). Then the chain rule gives

Image

so

Image

Thus we have expressed the partial derivatives of g in terms of those of f, that is, in terms of ∂f/∂x = D1f(r cos θ, r sin θ) and ∂f/∂y = D2f(r cos θ, r sin θ).

Image

Figure 2.10

The same can be done for the second order partial derivatives. Given a differentiable mapping F : ImagenImagem, the partial derivative DiF is again a mapping from Imagen to Imagem. If it is differentiable at a, we can consider the second partial derivative

Image

The classical notation is

Image

For example, the function f : Image2Image has second-order partial derivatives

Image

Continuing Example 3 we have

Image

In the last step we have used the fact that the “mixed partial derivatives” 2f/∂x ∂y and 2f/∂y ∂x are equal, which will be established at the end of this section under the hypothesis that they are continuous.

In Exercise 3.9, the student will continue in this manner to show that Laplace's equation

Image

transforms to

Image

in polar coordinates.

As a standard application of this fact, consider a uniform circular disk of radius 1, whose boundary is heated in such a way that its temperature on the boundary is given by the function g : [0, 2π] → Image, that is,

Image

for each Image; see Fig. 2.11. Then certain physical considerations suggest that the temperature function u(r, θ) on the disk satisfies Laplace's equation (8) in polar coordinates. Now it is easily verified directly (do this) that, for each positive integer n, the functions rn cos and rn sin satisfy Eq. (8). Therefore, if a Fourier series expansion

Image

Image

Figure 2.11

for the function g can be found, then the series

Image

is a plausible candidate for the temperature function u(r, θ)—it reduces to g(θ) when r = 1, and satisfies Eq. (8), if it converges for all Image and if its first and second order derivatives can be computed by termwise differentiation.

Example 4 Consider an infinitely long vibrating string whose equilibrium position lies along the x-axis, and denote by f(x, t) the displacement of the point x at time t (Fig. 2.12). Then physical considerations suggest that f satisfies the one-dimensional wave equation

Image

Image

Figure 2.12

where a is a certain constant. In order to solve this partial differential equation, we make the substitution

Image

where A, B, C, D are constants to be determined. Writing g(u, v) = f(Au + Bv, Cu + Dv), we find that

Image

(see Exercise 3.7). If we choose Image, C = 1/2a, D = −1/2a, then it follows from this equation and (9) that

Image

This implies that there exist functions Image, ψ : ImageImage such that

Image

In terms of x and t, this means that

Image

Suppose now that we are given the initial position

Image

and the initial velocity D2f(x, 0) = G(x) of the string. Then from (10) we obtain

Image

and

Image

so

Image

by the fundamental theorem of calculus. We then solve (11) and (12) for Image(x) and ψ(x):

Image

Image

Image

Upon substituting x + at for x in (13), and x − at for x in (14), and adding, we obtain

Image

This is “ d‘Alembert’s solution ” of the wave equation. If G(x) ≡ 0, the picture looks like Fig. 2.13. Thus we have two “waves” moving in opposite directions.

The last two examples illustrate the use of chain rule formulas to “transform” partial differential equations so as to render them more amenable to solution.

Image

Figure 2.13

We shall now apply the chain rule to generalize some of the basic results of single-variable calculus. First consider the fact that a function defined on an open interval is constant if and only if its derivative is zero there. Since the function f(x, y) defined for x ≠ 0 by

Image

has zero derivative (or gradient) where it is defined, it is clear that some restriction must be placed on the domain of definition of a mapping if we are to generalize this result correctly.

The open set Image is said to be connected if and only if given any two points a and b of U, there is a differentiable mapping Image : ImageU such that Image(0) = a and Image(1) = b (Fig. 2.14). Of course the mapping F : UImagem is said to be constant on U if F(a) = F(b) for any two points Image, so that there exists Image such that F(x) = c for all Image.

Image

Figure 2.14

Theorem 3.2 Let U be a connected open subset of Imagen. Then the differentiable mapping F : UImagem is constant on U if and only if F′(x) = 0 (that is, the zero matrix) for all Image.

PROOF Since F is constant if and only if each of its component functions is, and the matrix F′(x) is zero if and only if each of its rows is, we may assume that F is real valued, F = f : UImage. Since we already know that f′(x) = 0 if fis constant, suppose that f′(x) = Imagef(x) = 0 for all Image.

Given a and Image, let Image : ImageU be a differentiable mapping with Image(0) = a, Image(1) = b.

If g = f Image Image : ImageImage, then

Image

for all Image, by Eq. (6) above. Therefore g is constant on [0, 1], so

Image

Corollary 3.3 Let F and G be two differentiable mappings of the connected set Image into Imagem. If F′(x) = G′(x) for all Image, then there exists Image such that

Image

for all Image. That is, F and G differ only by a constant.

PROOF Apply Theorem 3.2 to the mapping F − G.

Image

Now consider a differentiable function f : UImage, where U is a connected open set in Image2. We say that f is independent of y if there exists a function g : ImageImage such that f(x, y) = g(x) if Image. At first glance it might seem that f is independent of y if D2 f = 0 on U. To see that this is not so, however, consider the function f defined on

Image

by f(x, y) = x2 if x > 0 or y > 0, and f(x, y) = −x2 if Image and y< 0. Then D2f(x, y) = 0 on U. But f(−1, 1) = 1, while f(−1, −1) = −1. We leave it as an exercise for the student to formulate a condition on U under which D2f = 0 doesimply that f is independent of y.

Let us recall here the statement of the mean value theorem of elementary single-variable calculus. If f : [a, b] → Image is a differentiable function, then there exists a point Image such that

Image

The mean value theorem generalizes to real-valued functions on Imagen (however it is in general false for vector-valued functions—see Exercise 1.12. In the following statement of the mean value theorem in Imagen, by the line segmentfrom a to b is meant the set of all points in Imagen of the form (1 − t)a + tb for Image.

Theorem 3.4 (Mean Value Theorem) Suppose that U is an open set in Imagen, and that a and b are two points of U such that U contains the line segment L from a to b. If f is a differentiable real-valued function on U, then

Image

for some point Image.

PROOF Let Image be the mapping of [0, 1] onto L defined by

Image

Then Image is differentiable with Image′(t) = b − a. Hence the composition g = f Image Image is differentiable by the chain rule. Since g : [0, 1] → Image, the single-variable mean value theorem gives a point Image such that g(1) − g(0) = g′(ξ). If Image, we then have

Image

Image

Note that here we have employed the chain rule to deduce the mean value theorem for functions of several variables from the single-variable mean value theorem.

Next we are going to use the mean value theorem to prove that the second partial derivatives Dj Dif and Dj Dif are equal under appropriate conditions. First note that, if we write b = a + h in the mean value theorem, then its conclusion becomes

Image

for some Image.

Recall the notation Δfa(h) = f(a + h) − f(a). The mapping Δfa: ImagenImage is sometimes called the “first difference” of f at a. The “second difference” of f at a is a function of two points h, k defined by (see Fig. 2.15)

Image

Image

Figure 2.15

The desired equality of second order partial derivatives will follow easily from the following lemma, which expresses Image in terms of the second order directional derivative Dk Dhf(x), which is by definition the derivative with respect to k of the function Dhf at x, that is

Image

Lemma 3.5 Let U be an open set in Imagen which contains the parallelogram determined by the points a, a + h, a + k, a + h + k. If the real-valued function f and its directional derivative Dhf are both differentiable on U, then there exist numbers Image such that

Image

PROOF Define g(x) in a neighborhood of the line segment from a to a + h by

Image

Then g is differentiable, with

Image

and

Image

Theorem 3.6 Let f be a real-valued function defined on the open set U in Imagen. If the first and second partial derivatives of f exist and are continuous on U, then Di Djf = Dj Dif on U.

PROOF Theorem 2.5 implies that both f and its partial derivatives D1 f and D2 f are differentiable on U. We can therefore apply Lemma 3.5 with h = hei and k = kej, provided that h and k are sufficiently small that U contains the rectangle with vertices a, a + hei, a + kej, a + hei + kej. We obtain Image such that

Image

If we apply Lemma 3.5 again with h and k interchanged, we obtain α2, Image such that

Image

But it is clear from the definition of Δ2fa that

Image

so we conclude that

Image

using the facts that Image.

If we now divide the previous equation by hk, and take the limit as h → 0, k → 0, we obtain

Image

because both are continuous at a.

Image

REMARK In this proof we actually used only the facts that f, D1f, D2f are differentiable on U, and that Dj Di f and Di Dj f are continuous at the point a. Exercise 3.16 shows that the continuity of Dj Dif and Di Dj f at a are necessary for their equality there.

Exercises

3.1Let f : Image be differentiable at each point of the unit circle x2 + y2 = 1. Show that, if u is the unit tangent vector to the circle which points in the counterclockwise direction, then

Image

3.2If f and g are differentiable real-valued functions on Image, show that

(a) Image(f + g) = Imagef + Imageg,

(b) Image(fg) = fImageg + g Imagef,

(c) Image(fn) = nfn−1 Imagef.

3.3Let F, G : Image be differentiable mappings, and h : Image a differentiable function. Given Image, show that

(a) Du(F + G) = Du F + Du G,

(b) Du+v F = Du F + Dv F,

(c) Du(hF) = (Du h)F + h(Du F).

3.4Show that each of the following two functions is a solution of the heat equation Image (k a constant).

(a)Image

(b) Image

3.5Suppose that f : Image has continuous second order partial derivatives. Set x = s + t, y = s − t to obtain g : Image defined by g(s, t) = f(s + t, s − t). Show that

Image

that is, that

Image

3.6Show that

Image

if we set x = 2s + t, y = s − t. First state what this actually means, in terms of functions.

3.7If g(u, v) = f(Au + Bv, Cu + Dv), where A, B, C, D are constants, show that

Image

3.8Let f : Image be a function with continuous second partial derivatives, so that 2f/∂x ∂y = ∂2f/∂y ∂x. If g : Image is defined by g(r, θ) = f(r cos θ, r sin θ), show that

Image

This gives the length of the gradient vector in polar coordinates.

3.9If f and g are as in the previous problem, show that

Image

This gives the 2-dimensional Laplacian in polar coordinates.

3.10Given a function f: Image with continuous second partial derivatives, define

Image

where ρ, θ, ϕ are the usual spherical coordinates. We want to express the 3-dimensional Laplacian

Image

in spherical coordinates, that is, in terms of partial derivatives of F.

(a)First define g(r, θ, z) = f(r cos θ, r sin θ, z) and conclude from Exercise 3.9 that

Image

(b)Now define F(ρ, θ, ϕ) = g(ρ sin ϕ, θ, ρ cos ϕ). Noting that, except for a change in notation, this is the same transformation as before, deduce that

Image

3.11(a)If if Image show that

Image

for x ≠ 0.

(b)Deduce from (a) that, if Image2f = 0, then

Image

where a and b are constants.

3.12Verify that the functions rn cos and rn sin satisfy the 2-dimensional Laplace equation in polar coordinates.

3.13If f(x, y, z) = (1/r) g(t − r/c), where c is constant and r = (x2 + y2 + z2)1/2, show that f satisfies the 3-dimensional wave equation

Image

3.14The following example illustrates the hazards of denoting functions by real variables. Let w = f(x, y, z) and z = g(x, y). Then

Image

since ∂x/∂x = 1 and ∂y/∂x = 0. Hence ∂w/∂z ∂z/∂x = 0. But if w = x + y + z and z = x + y, then ∂w/∂z = ∂z/∂x = 1, so we have 1 = 0. Where is the mistake?

3.15Use the mean value theorem to show that 5.18 < [(4.1)2 + (3.2)2]1/2 < 5.21. Hint: Note first that (5)2 < x2 + y2 < (5.5)2 if 4 < x < 4.1 and 3 < y < 3.2.

3.16Define f : Image by f(x, y) = xy(x2y2)/(x2 + y2) unless x = y = 0, and f(0, 0) = 0.

(a)Show that D1 f(0, y) = −y and D2f(x, 0) = x for all x and y.

(b)Conclude that D1D2 f(0, 0) and D2 D1 f(0, 0) exist but are not equal.

3.17The object of this problem is to show that, by an appropriate transformation of variables, the general homogeneous second order partial differential equation

Image

with constant coefficients can be reduced to either Laplace's equation, the wave equation, or the heat equation.

(a)If ac − b2 > 0, show that the substitution s = (bx − ay)/(ac − b2)1/2, t = y changes (*) to

Image

(b)If ac − b2 = 0, show that the substitution s = bx − ay, t = y changes (*) to

Image

(c)If ac − b2 < 0, show that the substitution

Image

changes (*) to 2u/∂s ∂t = 0.

3.18Let F : Imagem be differentiable at Image. Given a differentiable curve φ: Image with φ(0) = a, φ′(0) = v, define ψ = F Image φ, and show that ψ′(0) = dFa(v). Hence, if Image : Image is a second curve with Image(0) = a, Image′(0) = v, and Image = FImageImage, then Image′(0) = ψ′(0), because both are equal to dFa(v). Consequently F maps curves through a, with the same velocity vector, to curves through F(a) with the same velocity vector.

3.19Let φ: Image, f: Imagem, and g : Image be differentiable mappings. If h = g ImagefImageφ show that h′(t) = Imageg(f(φ(t))) • Dφ′(t)f(φ(t)).