The chain rule - Differentiation - Two-Dimensional Calculus

Two-Dimensional Calculus (2011)

Chapter 2. Differentiation

7. The chain rule

The following situation arises in many different guises. We have a function f(x, y) defined in a domain D, and x and y are themselves functions of a third variable t. Our problem is, how does f behave as a function of t? This is often referred to as the problem of “composite functions,” or in certain contexts, of “related rates.” One way to visualize the situation, is to consider the functions x(t), y(t) as defining a curve C in D, and to interpret the problem as one of describing the behavior of f along this curve.

In most cases we are concerned with regular curves. There are, however, situations in which it is desirable to allow more freedom. The word curve, without qualifications, is used whenever the defining functions x(t), y(t) are continuous. A differentiable curve is a curve for which x(t) and y(t) are continuously differentiable.

We adopt the following notation. Let f(x, y) be defined in D. Let

Image

be a curve in D. Then we set

Image

Example 7.1

Image

and

Image

then

Image

Thus fc(t) is simply the function of t obtained by substituting for x and y their expressions in terms of t. It is a function of a single variable, and we may ask when it is differentiable, and what is the value of its derivative.

Theorem 7.1 If f(x, y) is continuously differentiable in D, and if C is a differentiable curve in D defined by x(t), y(t), a ≤ t ≤ b, then fc(t) is differentiable, and we have

Image

where x0 = x(t0), y0 = y(t0).

Remark Equation (7.2) is known as the chain rule. It is often written more briefly in the form

Image

Here it is understood that on the left-hand side we have substituted in f the expressions for x and y as functions of t, and that each factor on the right-hand side is evaluated at a particular point of the t axis or the corresponding point in the x, y plane.

PROOF. We must again use the fundamental Lemma 5.1. Setting x = x(t), y = y(t) and x0 = x(t0), y0 = y(t0), we have from Eq. (5.3)

Image

Then as tt0, (xx0)/(tt0) → x'(t0), (yy0)/(tt0) → y'(t0), and to prove Eq. (7.2) we must only show that h(x, y)/(tt0) → 0. But

Image

the second factor on the right-hand side tends to the limit Image whereas the first factor, by Lemma 5.1 tends to zero. Image

Example 7.2

In Example 7.1 we have

Image

Further

Image

Thus, if t0 = Imageπ, we have x0 = Image/2, y0 = Image/2,

Image

Hence,

Image

In this case we could also calculate Image directly, obtaining

Image

Setting t = Imageπ, we find Image = − 1.

Before we go on to applications of the chain rule, we note some important special cases.

First, consider the case where the curve C is a straight line, and the parameter t is the distance from (x0, y0) a variable point on the line. Then C is given by x = x0 + t cos α, y = y0 + t sin α, where Imagecos α, sin α) is a unit vector in the direction of the line. Then x'(t) = cos α, y'(t) = sin α, and

Image

Thus the directional derivative appears as a special case of (7.2) in which the curve C is a straight line.

Consider next the case where C is an arbitrary regular curve, but the parameter t represents arc length along C. Then Imagex'(t0), y'(t0Image represents a unit vector tangent to C. We have denoted this vector earlier by T. If we set T = Imagecos α, sin αImage , then Eq. (7.2) may be written

Image

where t is arc length parameter on C.

We may describe Eq. (7.3) as follows. The rate of change of f with respect to arc length along any regular curve C through a point (x0, y0), depends only on the unit tangent T to C at (x0, y0) and equals the directional derivative of f in the direction of T.

Example 7.3

Find the directional derivative αf(1, 1), where f(x, y) = x2 + y2 and α = Imageπ.

We note that the vector T = Imagecos α, sin αImage = ImageImage/2, Image/2Image is tangent to the circle x2 + y2 = 2 at the point (1, 1). But this circle is a regular curve C: x = Image cos t, y = Image sin t, along which the function f is constant (Fig. 7.1). Namely, fc(t) ≡ 2. Hence Image ≡ 0, and by Eq. (7.3), αf(1, 1) = 0.

Image

FIGURE 7.1 Directional derivative in tangent direction to a level curve

Example 7.3 is a special case of the situation in which we apply Eq. (7.3) to a curve C which is a level curve of the function f(x, y). We discuss the general case in detail in Sect. 8.

By a pass through a mountain range we mean a path across the mountains, whose highest point is as low as possible. The highest point of the path is called the summit of the pass. We use the following precise characterization of such a point.

Definition 7.1 The summit of a pass is a point having the property that there are two paths through the point in different directions, such that it is the highest point along one of the paths and the lowest point along the other (Fig. 7.2).

Image

FIGURE 7.2 Summit of a pass

As another application of Eq. (7.3), we can state the following:

At a summit of a pass, the gradient is zero.

Proof. We assume, as usual, that the mountain range is defined by z = f(x, y), and we let (x0, y0) correspond to the summit of the pass. Let C be one of the two paths through the summit referred to in Def. 7.1. Then fc(t) has a local maximum or minimum as a function of t at the summit, and hence Image = 0 at this point. By Eq. (7.3) the directional derivative of f is zero in the direction of each of these paths. But then the gradient of f must be zero, since if it were different from zero, the directional derivative could be zero only in the direction perpendicular to the gradient vector, as we observed in consequence of Eq. (6.8).

Image

The following are two important applications of the chain rule.

Theorem 7.2 Let f(x, y) be continuously differentiable in a domain D. Then f is constant if and only if f ≡ 0 in D.

PROOF. If f is constant, then fx ≡ 0, fy ≡ 0, and hence f ≡ 0 in D. To prove the converse, assume f ≡ 0, and let (x1, y1) be any point of D. Let k = f(x1, y1). We shall show that f(x, y) ≡ k in D. For any point (x2, y2) in D, let C be a regular curve in D joining (x1, y1) to (x2, y2); let C be given by x(t), y(t), atb. Then the mean-value theorem applied to the function fc(t) states that

Image

where t0 is some value of t between a and b. If we let x0 = x(t0), y0 = y(t0), then substituting the definition of fc(t) in Eq. (7.4), and applying Eq. (7.2), we have

Image

since by assumption fx ≡ 0 and fy ≡ 0. Hence f(x2, y2) = f(x1, y1) = k, and since (x2, y2) is an arbitrary point of D, we have f(x, y) = k for all (x, y) in D.

Image

Corollary If gh in D, then g(x, y) = h(x, y) + k, where k is a constant.

PROOF. Let f(x, y) = g(x, y) – h(x, y). Then f ≡ 0 in D, and applying Th. 7.2, f(x, y) ≡ k.

Image

Remark Theorem 7.2 and its Corollary would be false if D were not connected. We made important use of the fact that any two points in D could be joined by a curve in D.

Example 7.4

Let ax + by + c = 0 be the equation of a fixed line in the plane. Find the distance to this line from an arbitrary point (X, Y).

Let us denote the distance in question by f(X, Y). If we consider the distance to be a function of the variables X, Y, then we can determine the gradient of this function by our geometric characterization of the gradient vector in terms of the direction and magnitude of the greatest change in the function (Theorem 6.2). Starting at any point, the rate of change of distance to the fixed line is greatest if we move directly away from the line. Since the vector Imagea, bImage , is perpendicular to the line ax + by + c = 0, the direction of f must be the same as (or opposite to) that of Imagea, bImage . Furthermore, if a point moves directly away from the line, its distance to the line increases by exactly the distance it traverses, so that the rate of change of f(X, Y) in this direction is equal to 1. Hence |f| = 1, and we have

Image

then

Image

Applying the above Corollary we have

Image

To find the value of k, we note that for any point (X0, Y0) on the given line, we have a X0 + b Y0 + c = 0, and f(x0, Y0) = 0. Hence,

Image

and since f(X, Y) ≥ 0 everywhere, we must have

Image

This is precisely the formula for the distance from the point (x, y) to the line ax + by + c = 0.

Theorem 7.3 Mean-Value Theorem Let f(x, y) be continuously differentiable in a domain D, and let (x1, y1) and (x2, y2) be points of D such that the entire line segment L from (x1, y1) to (x2, y2) lies in D (Fig. 7.3). Then there is a point (x0, y0) on L such that

Image

PROOF. We consider the line segment L to be a regular curve

Image

Applying the mean-value theorem to the function fc(t), we obtain

Image

If we let x0 = x(t0), y0 = y(t0), and if we note that x'(t) = x2x1, y'(t) = y2y1, then applying Eq. (7.2), Eq. (7.6) takes the form

Image

Image

FIGURE 7.3 Mean-value theorem for a function of two variables

Remark It is instructive to compare the statement of the mean-value theorem, Eq. (7.5), with the fundamental Lemma 5.1, Eq. (5.3). Both express the change in a function f as we move from one point to another, essentially as the change in x times the x derivative of f plus the change in y times the y derivative of f. In Lemma 5.1, the partial derivatives fx and fy are evaluated at (x0, y0), and a remainder term h(x, y) must be included. In the mean-value theorem, we obtain a precise expression with no remainder term, but fx and fy are evaluated at an unknown point, about which we can say only that it lies between the two given points.

Example 7.5

Find approximately the distance of the point (4.2, 3.1) to the origin.

We consider the function f(x, y) = (x2 + y2)1/2, representing the distance of any point (x, y) to the origin. Let (x1, y1) = (4, 3), and (x2, y2) = (4.2, 3.1). We have fx = x/(x2 + y2)1/2 and fy = y/(x2 + y2)1/2. We apply the fundamental Lemma 5.1 and obtain

Image

Since h(x, y) is a remainder term, we have an approximate value,

Image

This method is excellent for obtaining a quick estimate for a function near a point where we know its value. The disadvantage of this method is that it gives no indication of the accuracy of the estimate. One way to test the accuracy is to use Th. 7.3. We have a point (x0, y0) on the line segment from (4, 3) to (4.2, 3.1). Thus, 4 < x0 < 4.2, and 3 < y0 < 3.1. Hence, Image > 5,

Image

Thus,

Image

This gives an upper estimate, and we can use it in turn to obtain a lower estimate. We have

Image

and, consequently,

Image

and

Image

Combining Eqs. (7.7) and (7.8), we have

Image

Thus our original estimate was correct to within essentially one-hundredth. Equation (7.9) gives precise bounds.

We conclude this section with two rather special, but useful formulas, which may be derived from the chain rule. The first involves the notion of homogeneity, abstracted from the basic property of homogeneous polynomials.

Definition 7.2 A function f(x, y) is homogeneous of degree k if

Image

Example 7.6

Consider a homogeneous polynomial P(x, y) of degree k. Each of its terms is of the form cxmyn, where m + n = k. Hence,

Image

and λk factors out from any sum of such terms. Thus P(λx, λy) = λkP(x, y), and P(x, y) satisfies the definition of a function homogeneous of degree k.

Example 7.7

f(x, y) = 1/(x2 + y2). Then f(λx, λy) = 1/(λ2(x2 + y2)) = λ−2f(x, y). Here k = − 2.

Example 7.8

f(x, y) = xy/(x2 + y2). Then f(λx, λy) = f(x, y) = λ0f(x, y). Here k = 0.

Example 7.9

Image. Here f f(λx, λy) = λ3/2f(x, y), and k = Image.

Note that, in general, k need not be a positive integer, as in the case of polynomials, but may have any real value, positive, negative, or zero.

Lemma 7.1 Let f(t) be defined and differentiable for all t > 0. Then f(t) = ctk for some constant c ⇔

Image

PROOF. Let g(t) = f(t)/tk. Then

Image

Hence condition (7.11) holds ⇔ g'(t) ≡ 0 ⇔ g(t) ≡ c.

Image

Theorem 7.4 Euler’s Theorem A continously differentiable function f(x, y) is homogeneous of degree k if and only if it satisfies

Image

PROOF. Let (x0, y0) be any fixed point, and let C be the curve x = tx0, y = ty0, t > 0. Then x'(t) = x0, y'(t) = y0, and by the chain rule, we have for the function fC(t) = f(tx0, ty0) that

Image

and hence

Image

Thus, if we set x = tx0, y = ty0, then in our present notation, Eq. (7.12) takes the form

Image

By Lemma 7.1, this is equivalent to fc(t) = ctk. The constant c is determined by setting t = 1. We find c = fc(1) = f(x0, y0). Thus

Image

for all t > 0, which is precisely the condition (Eq. (7.10)) for homogeneity of degree k.

Image

Remarks

1. Geometrically, the meaning of homogeneity is that the surface z = f(x, y) consists of curves z = csk over each ray in the x, y plane through the origin, where s is distance to the origin, and the coefficient c varies from ray to ray, but the exponent k is fixed (Fig. 7.4). Thus a homogeneous function of degree k is completely known if it is known at one point on each ray, since that is sufficient to determine the value of the coefficient c.

Image

FIGURE 7.4 Geometric meaning of homogeneity

2. Although the domain of f(x, y) was not explicitly stated, it is generally the whole plane with the possible exception of the origin. In some cases it is a sector, or a collection of sectors, with vertex at the origin. What is needed in the proof is to know that if f is defined at some point, then it is defined on the entire ray through that point.

We conclude with an application of the chain rule to the following situation. Let f(x, u) be continuous in some square axb, aub. Define the function g(x) for axb by

Image

How do we find g'(x)?

This is a typical example of a problem that can be answered more easily if stated in an apparently more difficult form. Consider the function

Image

where y may be an arbitrary function of x. We start with the important special case when y is a constant.

Lemma 7.2 Differentiation under the Integral Sign Let f(x, u) and fx(x, u) be continuous for x0δxx0 + δ, aub. Let

Image

Then φ(x) is differentiable and

Image

PROOF. We form

Image

where x1 lies between x and x0, by the mean-value theorem for functions of one variable. Thus, by the continuity of fx, we have

Image

Remark A close examination of the last step, in which we interchanged the order of integration and limit as xx0, shows that one actually needs uniform continuity2 of the function fx(x, u). However, that is a consequence of ordinary continuity on the closed rectangle x0δxx0 + δ, aub.

We return now to the function h(x, y) defined in Eq. (7.14). If we fix x and let y vary, then by the fundamental theorem of calculus

Image

On the other hand, if y is fixed and x varies, then by Lemma 7.2,

Image

Thus, if f and fx are both continuous, h is continuously differentiable, and we may apply the chain rule, where x and y are given as arbitrary differentiable functions of our auxiliary variable t:

Image

If we now specialize to the case where y is a given function of x, y = F(x), then we may choose xt, and we have

Image

Equation (7.15) contains both the fundamental theorem of calculus and Lemma 7.2 as special cases. It also provides the answer to our original question concerning the function g(x) obtained by setting f(x) ≡ x. We find

Image

Exercises

7.1 For each of the following functions f(x, y) and curves C: x(t), y(t) find fc(t) = f(x(t), y(t)).

a. f(x, y) = x2y2; x = 2 cos t, y = 2 sin t

b. f(x, y) = xy; x = 2 cos t, y = 2 sin t

c. f(x, y) = x2y2; x = 2 cosh t, y = 2 sinh t

d. f(x, y) = exy; x = t, y = et

e. f(x, y) = x3 + 2x2y + xy2; x = 2t, y = 3t

f. f(x, y) = log (1 + x2 + y2) − 2 arc tan y; x = log (1 + t2), y = et

7.2 For each function and curve in Ex. 7.1, find the point on the curve corresponding to t = 0, find the gradient of the function f(x, y) at that point, and compute Image by the chain rule. Then check your answer by differentiating fc(t) and setting t = 0.

7.3 Given the differentiation formulas for functions of one variable:

Image

find a formula for (d/dx)xx, by applying the chain rule to f(x, y) = xy, where x = t, y = t.

Note: in order to compute Image, as in Ex. 7.2, in some cases it is easier to find the function fc(t) explicitly and then differentiate, while in other cases it is simpler to compute the gradient of f at the point corresponding to t = 0 and then apply the chain rule. One example of the latter case is when the gradient turns out to be zero. There is also an important situation in which only the chain rule can be applied. Namely, if the curve C is not known explicitly, but if the velocity vector is known at a given point, then Image can be found at that point. This is illustrated in Exs. 7.4–7.6.

7.4 A rectangular sheet of rubber is being stretched in such a way that at a certain moment when its length is 10 inches and its width is 4 inches, its length is increasing at the rate of 2 inches per second and its width is decreasing at the rate of 1 inch per second. Is its area increasing or decreasing at that moment, and by how much?

7.5 The pressure P of a gas is related to the volume V and to the absolute temperature T by the relation P = cT/V, where c is a constant. If hot air is being pumped into a balloon so that at a given instant V = 1200, T = 360, P = 30, dV/dt = 8, and dT/dt = 4, what is dP/dt? Is the pressure increasing or decreasing?

7.6 On a part of a map, the height (in miles) above sea level at the point (x, y) is equal to 1 − x2/64 − y2/60. If a car going 50 miles per hour is headed directly away from the origin as it passes the point (4, 3), how fast is its height above sea level changing at that point ? Is it increasing or decreasing ?

7.7 Suppose that f(x, y) is homogeneous of degree k ≠ 0, and that f ≡ 0. Show that f(x, y) ≡ 0.

7.8 Determine which of the following functions are homogeneous, and for the homogeneous ones find the degree.

a. 5x6y − x2y5

b. 8x7y + 7x6y2 + 3x2y5

c. Image

d. Image

e. x2y − 2x3/2y3/2

f. x1/2y − 2y3/2ey/x

g. Image

h. Image

7.9 Verify Euler’s theorem for each of the homogeneous functions in Ex. 7.8.

7.10 Let f(x, y) be homogeneous of degree k and let g(x, y) be homogeneous of degree l. For each of the following functions h(x, y), determine whether h(x, y) is homogeneous, and if so, determine the degree.

a. h(x, y) = f(x, y) + g(x, y)

b. h(x, y) = f(x, y)g(x, y)

c. h(x, y) = f(x, y)/g(x, y)

d. h(x, y) = f(x, y)3

e. h(x, y) = f(x2, y2)

f. h(x, y) = f(g(x, y), g(x, y))

7.11 Show that f(x, y) is homogeneous of degree zero if and only if there exists a function g(t) of one variable such that

Image

7.12 Show that f(x, y) is homogeneous of degree k if and only if there exists a function g(t) of one variable such that

Image

7.13 Give a geometric interpretation of Ex. 7.11 in terms of the surface z = f(x, y), where f(x, y) is homogeneous of degree zero.

7.14 Derive Euler’s Eq. (7.12) from Ex. 7.12.

7.15 Use Ex. 7.12 to show that if f(x, y) is homogeneous of degree k, then fx and fy are homogeneous of degree k − 1.

7.16 For each of the following functions f(x, y) find an approximate value at the point (x2, y2) by applying Lemma 5.1 at the point (x1, y1) and neglecting ihe remainder term.

a. f(x, y) = x2 + xy + y2, (x2, y2) = (2.1, 3.2), (x1, y1) = (2, 3)

b. Image

c. Image

d. f(x, y) = log (x − sin y), (x2, y2) = (1.2, 0.3), (x1, y1) = (1, 0)

7.17 Use Th. 7.3 to obtain precise bounds in Ex. 7.16a.

7.18 Find g'(x), where

a. Image

b. Image

c. Image

d. Image

e. Image

7.19 a. Setting g(x) = Image cos xt dt, we have g'(x) = − Image t sin xt dt, and g'(1) = − Image t sin t dt. Evaluate g(x) explicitly, differentiate, and set x = 1 in order to compute Image t sin t dt. Check your answer by integrating by parts.

b. Use a similar procedure to evaluate Image te2t dt.

7.20 Show that if

Image

(Hint: Write Image for some constant a.)

7.21 Use Ex. 7.20 to find g'(x), where

a. Image

b. Image

7.22 Find the answer to Ex. 7.21b by computing the integral in order to obtain g(x) explicitly, and then differentiating.

7.23 a. Show that the function Image satisfies g(0) = g'(0) = 0, g'(x) = h(x).

b. Show that if

Image

Note: Ex. 7.23b can be used to derive Taylor’s formula with integral remainder term:

Image

since both sides of this equation have the same (n + 1)st derivative, while the first n derivatives agree at the origin.

7.24 a. Use Eq. (7.15) to show that if Image then

Image

b. If a curve is given in polar coordinates by r = r(θ), then the area A swept out by the radius vector for aθb is given by

Image

If r and θ are given parametrically as functions of t, and if A(t) is the area swept out between a fixed t0 and t, show that

Image

7.25 Let a curve C be given by x(t), y(t).

a. Let Image arc tan (y/x). Use the chain rule to find dr/dt, /dt.

b. Using Ex. 7.24b, show that

Image

c. Show that dA/dt is constant if and only if the acceleration vector is a scalar multiple of the radius vector Imagex, yImage . (Hint: see Ex. 2.16c, d.)

Note: Kepler’s first law of planetary motion states that each planet moves in an ellipse with the sun at one focus. Kepler’s second law states that planetary motion is such that the radius vector from the sun to a planet sweeps out equal areas in equal times; in other words dA/dt is constant. Exercise 7.25 shows that Kepler’s second law does not depend on any special properties of gravitational force, other than the fact that it is a central force, directed along the line from the planet to the sun.

7.26 Prove the following theorems.

a. If f(x, y) is continuously differentiable in the whole plane, and if fy ≡ 0, then f(x, y) = g(x), where g is a function of one variable. (Hint: let g(x) = f(x, 0) and apply the mean value theorem to f(x, y) − f(x, 0).)

b. If f(x, y) is continuously differentiable in the whole plane, and if fx ≡ 0, then f(x, y) = h(y), where h is a function of one variable.

7.27 Show that the theorems in Ex. 7.26 remain valid if f(x, y) is continuously differentiable in a domain D consisting of the upper half-plane, the right halfplane, a rectangle, or a disk.

*7.28 Formulate a general condition on a domain D such that the first theorem in Ex. 7.26 will be valid for an arbitrary continuously differentiable function f(x, y) in D.

*7.29 Show that the theorems in Ex. 7.26 are not valid in an arbitrary domain D, by verifying the details in the following example.

a. Sketch the domain D defined by x > 0, 1 < x2 + y2 < 4.

b. Let f(x, y) be defined in the domain D of part a by

Image

Show that f(x, y) is continuously differentiable in D, and fy ≡ 0.

c. Show that, for example, Image and hence f(x, y) can not be written as g(x), a function of x alone.