Advanced Calculus of Several Variables (1973)
Part II. Multivariable Differential Calculus
Chapter 3. THE CHAIN RULE
Consider the composition H = G F of two differentiable mappings F : n → m and G : m → k. For example, might be the price vector of m intermediate products that are manufactured at a factory from nraw materials whose cost vector is x (that is, the components of are the prices of the n raw materials), and H(x) = G(F(x)) the resulting price vector of k final products that are manufactured at a second factory from the mintermediate products. We might wish to estimate the change ΔHa(h) = H(a + h) − H(a) in the prices of the final products, resulting from a change from a to a + h in the costs of the raw materials. Using the approximations ΔF ≈ dFand ΔG ≈ dG, without initially worrying about the accuracy of our estimates, we obtain
This heuristic “argument” suggests the possibility that the multivariable chain rule takes the form dHa = dGF(a) dFa, analogous to the restatement in Section 1 of the familiar single-variable chain rule.
Theorem 3.1(The Chain Rule) Let U and V be open subsets of n and m respectively. If the mappings F : U → m and G : V → k are differentiable at and respectively, then their composition H = G Fis differentiable at a, and
In terms of derivatives, we therefore have
In brief, the differential of the composition is the composition of the differentials; the derivative of the composition is the product of the derivatives.
PROOF We must show that
If we define
and
then the fact that F and G are differentiable at a and F(a), respectively, implies that
Then
by (Eq. (4) with k = F(a + h) − F(a). Using (3) we then obtain
Therefore
But limh→0 dGF(a)((h)) = 0 because limh→0 (h) = 0 and the linear mapping dGF(a) is continuous. Also limh→0 ψ(F(a + h) − F(a)) = 0 because F is continuous at a and limk→0 ψ(k) = 0. Finally the number dFa(h/h) + (h)remains bounded, because and the component functions of the linear mapping dFa are continuous and therefore bounded on the unit sphere (Theorem I.8.8).
Consequently the limit of (5) is zero as desired. Of course Eq. (2) follows immediately from (1), since the matrix of the composition of two linear mappings is the product of their matrices.
We list in the following examples some typical chain rule formulas obtained by equating components of the matrix equation (2) for various values of n, m, and k. It is the formulation of the chain rule in terms of differential linear mappings which enables us to give a single proof for all of these formulas, despite their wide variety.
Example 1 If n = k = 1, so we have differentiable mappings , then h = g f : → is differentiable with
Here g′(f(t)) is a 1 × m row matrix, and f′(t) is an m × 1 column matrix. In terms of the gradient of g, we have
This is a generalization of the fact that Dvg(a) = g(a) · v ([see Eq. (10) of Section 2]. If we think of f(t) as the position vector of a particle moving in m, with g a temperature function on m, then 6) gives the rate of change of the temperature of the particle. In particular, we see that this rate of change depends only upon the velocity vector of the particle.
In terms of the component functions f1, . . . , fm of f and the partial derivatives of g, (6) becomes
If we write xi = fi(t) and u = g(x), following the common practice of using the symbol for a typical value of a function to denote the function itself, then the above equation takes the easily remembered form
Example 2 Given differentiable mappings with composition H : 2 → 2, the chain rule gives
If we write F(s, t) = (x, y, z) and G(x, y, z) = (u, v), this equation can be rewritten
For example, we have
Writing
to go all the way with variables representing functions, we obtain formulas such as
and
The obvious nature of the formal pattern of chain rule formulas expressed in terms of variables, as above, often compensates for their disadvantage of not containing explicit reference to the points at which the various derivatives are evaluated.
Example 3 Let T : 2 → 2 be the familiar “polar coordinate mapping” defined by T(r, θ) = (r cos θ, r sin θ) (Fig. 2.10). Given a differentiable function f : 2 → , define g = f T, so g(r, θ) = f(r cos θ, r sin θ). Then the chain rule gives
so
Thus we have expressed the partial derivatives of g in terms of those of f, that is, in terms of ∂f/∂x = D1f(r cos θ, r sin θ) and ∂f/∂y = D2f(r cos θ, r sin θ).
Figure 2.10
The same can be done for the second order partial derivatives. Given a differentiable mapping F : n → m, the partial derivative DiF is again a mapping from n to m. If it is differentiable at a, we can consider the second partial derivative
The classical notation is
For example, the function f : 2 → has second-order partial derivatives
Continuing Example 3 we have
In the last step we have used the fact that the “mixed partial derivatives” ∂2f/∂x ∂y and ∂2f/∂y ∂x are equal, which will be established at the end of this section under the hypothesis that they are continuous.
In Exercise 3.9, the student will continue in this manner to show that Laplace's equation
transforms to
in polar coordinates.
As a standard application of this fact, consider a uniform circular disk of radius 1, whose boundary is heated in such a way that its temperature on the boundary is given by the function g : [0, 2π] → , that is,
for each ; see Fig. 2.11. Then certain physical considerations suggest that the temperature function u(r, θ) on the disk satisfies Laplace's equation (8) in polar coordinates. Now it is easily verified directly (do this) that, for each positive integer n, the functions rn cos nθ and rn sin nθ satisfy Eq. (8). Therefore, if a Fourier series expansion
Figure 2.11
for the function g can be found, then the series
is a plausible candidate for the temperature function u(r, θ)—it reduces to g(θ) when r = 1, and satisfies Eq. (8), if it converges for all and if its first and second order derivatives can be computed by termwise differentiation.
Example 4 Consider an infinitely long vibrating string whose equilibrium position lies along the x-axis, and denote by f(x, t) the displacement of the point x at time t (Fig. 2.12). Then physical considerations suggest that f satisfies the one-dimensional wave equation
Figure 2.12
where a is a certain constant. In order to solve this partial differential equation, we make the substitution
where A, B, C, D are constants to be determined. Writing g(u, v) = f(Au + Bv, Cu + Dv), we find that
(see Exercise 3.7). If we choose , C = 1/2a, D = −1/2a, then it follows from this equation and (9) that
This implies that there exist functions , ψ : → such that
In terms of x and t, this means that
Suppose now that we are given the initial position
and the initial velocity D2f(x, 0) = G(x) of the string. Then from (10) we obtain
and
so
by the fundamental theorem of calculus. We then solve (11) and (12) for (x) and ψ(x):
Upon substituting x + at for x in (13), and x − at for x in (14), and adding, we obtain
This is “ d‘Alembert’s solution ” of the wave equation. If G(x) ≡ 0, the picture looks like Fig. 2.13. Thus we have two “waves” moving in opposite directions.
The last two examples illustrate the use of chain rule formulas to “transform” partial differential equations so as to render them more amenable to solution.
Figure 2.13
We shall now apply the chain rule to generalize some of the basic results of single-variable calculus. First consider the fact that a function defined on an open interval is constant if and only if its derivative is zero there. Since the function f(x, y) defined for x ≠ 0 by
has zero derivative (or gradient) where it is defined, it is clear that some restriction must be placed on the domain of definition of a mapping if we are to generalize this result correctly.
The open set is said to be connected if and only if given any two points a and b of U, there is a differentiable mapping : → U such that (0) = a and (1) = b (Fig. 2.14). Of course the mapping F : U → m is said to be constant on U if F(a) = F(b) for any two points , so that there exists such that F(x) = c for all .
Figure 2.14
Theorem 3.2 Let U be a connected open subset of n. Then the differentiable mapping F : U → m is constant on U if and only if F′(x) = 0 (that is, the zero matrix) for all .
PROOF Since F is constant if and only if each of its component functions is, and the matrix F′(x) is zero if and only if each of its rows is, we may assume that F is real valued, F = f : U → . Since we already know that f′(x) = 0 if fis constant, suppose that f′(x) = f(x) = 0 for all .
Given a and , let : → U be a differentiable mapping with (0) = a, (1) = b.
If g = f : → , then
for all , by Eq. (6) above. Therefore g is constant on [0, 1], so
Corollary 3.3 Let F and G be two differentiable mappings of the connected set into m. If F′(x) = G′(x) for all , then there exists such that
for all . That is, F and G differ only by a constant.
PROOF Apply Theorem 3.2 to the mapping F − G.
Now consider a differentiable function f : U → , where U is a connected open set in 2. We say that f is independent of y if there exists a function g : → such that f(x, y) = g(x) if . At first glance it might seem that f is independent of y if D2 f = 0 on U. To see that this is not so, however, consider the function f defined on
by f(x, y) = x2 if x > 0 or y > 0, and f(x, y) = −x2 if and y< 0. Then D2f(x, y) = 0 on U. But f(−1, 1) = 1, while f(−1, −1) = −1. We leave it as an exercise for the student to formulate a condition on U under which D2f = 0 doesimply that f is independent of y.
Let us recall here the statement of the mean value theorem of elementary single-variable calculus. If f : [a, b] → is a differentiable function, then there exists a point such that
The mean value theorem generalizes to real-valued functions on n (however it is in general false for vector-valued functions—see Exercise 1.12. In the following statement of the mean value theorem in n, by the line segmentfrom a to b is meant the set of all points in n of the form (1 − t)a + tb for .
Theorem 3.4 (Mean Value Theorem) Suppose that U is an open set in n, and that a and b are two points of U such that U contains the line segment L from a to b. If f is a differentiable real-valued function on U, then
for some point .
PROOF Let be the mapping of [0, 1] onto L defined by
Then is differentiable with ′(t) = b − a. Hence the composition g = f is differentiable by the chain rule. Since g : [0, 1] → , the single-variable mean value theorem gives a point such that g(1) − g(0) = g′(ξ). If , we then have
Note that here we have employed the chain rule to deduce the mean value theorem for functions of several variables from the single-variable mean value theorem.
Next we are going to use the mean value theorem to prove that the second partial derivatives Dj Dif and Dj Dif are equal under appropriate conditions. First note that, if we write b = a + h in the mean value theorem, then its conclusion becomes
for some .
Recall the notation Δfa(h) = f(a + h) − f(a). The mapping Δfa: n → is sometimes called the “first difference” of f at a. The “second difference” of f at a is a function of two points h, k defined by (see Fig. 2.15)
Figure 2.15
The desired equality of second order partial derivatives will follow easily from the following lemma, which expresses in terms of the second order directional derivative Dk Dhf(x), which is by definition the derivative with respect to k of the function Dhf at x, that is
Lemma 3.5 Let U be an open set in n which contains the parallelogram determined by the points a, a + h, a + k, a + h + k. If the real-valued function f and its directional derivative Dhf are both differentiable on U, then there exist numbers such that
PROOF Define g(x) in a neighborhood of the line segment from a to a + h by
Then g is differentiable, with
and
Theorem 3.6 Let f be a real-valued function defined on the open set U in n. If the first and second partial derivatives of f exist and are continuous on U, then Di Djf = Dj Dif on U.
PROOF Theorem 2.5 implies that both f and its partial derivatives D1 f and D2 f are differentiable on U. We can therefore apply Lemma 3.5 with h = hei and k = kej, provided that h and k are sufficiently small that U contains the rectangle with vertices a, a + hei, a + kej, a + hei + kej. We obtain such that
If we apply Lemma 3.5 again with h and k interchanged, we obtain α2, such that
But it is clear from the definition of Δ2fa that
so we conclude that
using the facts that .
If we now divide the previous equation by hk, and take the limit as h → 0, k → 0, we obtain
because both are continuous at a.
REMARK In this proof we actually used only the facts that f, D1f, D2f are differentiable on U, and that Dj Di f and Di Dj f are continuous at the point a. Exercise 3.16 shows that the continuity of Dj Dif and Di Dj f at a are necessary for their equality there.
Exercises
3.1Let f : be differentiable at each point of the unit circle x2 + y2 = 1. Show that, if u is the unit tangent vector to the circle which points in the counterclockwise direction, then
3.2If f and g are differentiable real-valued functions on , show that
(a) (f + g) = f + g,
(b) (fg) = fg + g f,
(c) (fn) = nfn−1 f.
3.3Let F, G : be differentiable mappings, and h : a differentiable function. Given , show that
(a) Du(F + G) = Du F + Du G,
(b) Du+v F = Du F + Dv F,
(c) Du(hF) = (Du h)F + h(Du F).
3.4Show that each of the following two functions is a solution of the heat equation (k a constant).
(a)
(b)
3.5Suppose that f : has continuous second order partial derivatives. Set x = s + t, y = s − t to obtain g : defined by g(s, t) = f(s + t, s − t). Show that
that is, that
3.6Show that
if we set x = 2s + t, y = s − t. First state what this actually means, in terms of functions.
3.7If g(u, v) = f(Au + Bv, Cu + Dv), where A, B, C, D are constants, show that
3.8Let f : be a function with continuous second partial derivatives, so that ∂2f/∂x ∂y = ∂2f/∂y ∂x. If g : is defined by g(r, θ) = f(r cos θ, r sin θ), show that
This gives the length of the gradient vector in polar coordinates.
3.9If f and g are as in the previous problem, show that
This gives the 2-dimensional Laplacian in polar coordinates.
3.10Given a function f: with continuous second partial derivatives, define
where ρ, θ, ϕ are the usual spherical coordinates. We want to express the 3-dimensional Laplacian
in spherical coordinates, that is, in terms of partial derivatives of F.
(a)First define g(r, θ, z) = f(r cos θ, r sin θ, z) and conclude from Exercise 3.9 that
(b)Now define F(ρ, θ, ϕ) = g(ρ sin ϕ, θ, ρ cos ϕ). Noting that, except for a change in notation, this is the same transformation as before, deduce that
3.11(a)If if show that
for x ≠ 0.
(b)Deduce from (a) that, if 2f = 0, then
where a and b are constants.
3.12Verify that the functions rn cos nθ and rn sin nθ satisfy the 2-dimensional Laplace equation in polar coordinates.
3.13If f(x, y, z) = (1/r) g(t − r/c), where c is constant and r = (x2 + y2 + z2)1/2, show that f satisfies the 3-dimensional wave equation
3.14The following example illustrates the hazards of denoting functions by real variables. Let w = f(x, y, z) and z = g(x, y). Then
since ∂x/∂x = 1 and ∂y/∂x = 0. Hence ∂w/∂z ∂z/∂x = 0. But if w = x + y + z and z = x + y, then ∂w/∂z = ∂z/∂x = 1, so we have 1 = 0. Where is the mistake?
3.15Use the mean value theorem to show that 5.18 < [(4.1)2 + (3.2)2]1/2 < 5.21. Hint: Note first that (5)2 < x2 + y2 < (5.5)2 if 4 < x < 4.1 and 3 < y < 3.2.
3.16Define f : by f(x, y) = xy(x2 − y2)/(x2 + y2) unless x = y = 0, and f(0, 0) = 0.
(a)Show that D1 f(0, y) = −y and D2f(x, 0) = x for all x and y.
(b)Conclude that D1D2 f(0, 0) and D2 D1 f(0, 0) exist but are not equal.
3.17The object of this problem is to show that, by an appropriate transformation of variables, the general homogeneous second order partial differential equation
with constant coefficients can be reduced to either Laplace's equation, the wave equation, or the heat equation.
(a)If ac − b2 > 0, show that the substitution s = (bx − ay)/(ac − b2)1/2, t = y changes (*) to
(b)If ac − b2 = 0, show that the substitution s = bx − ay, t = y changes (*) to
(c)If ac − b2 < 0, show that the substitution
changes (*) to ∂2u/∂s ∂t = 0.
3.18Let F : m be differentiable at . Given a differentiable curve φ: with φ(0) = a, φ′(0) = v, define ψ = F φ, and show that ψ′(0) = dFa(v). Hence, if : is a second curve with (0) = a, ′(0) = v, and = F, then ′(0) = ψ′(0), because both are equal to dFa(v). Consequently F maps curves through a, with the same velocity vector, to curves through F(a) with the same velocity vector.
3.19Let φ: , f: m, and g : be differentiable mappings. If h = g fφ show that h′(t) = g(f(φ(t))) • Dφ′(t)f(φ(t)).