Advanced Calculus of Several Variables (1973)
Part II. Multivariable Differential Calculus
Our study in Chapter I of the geometry and topology of provides an adequate foundation for the study in this chapter of the differential calculus of mappings from one Euclidean space to another. We will find that the basic idea of multivariable differential calculus is the approximation of nonlinear mappings by linear ones.
This idea is implicit in the familiar single-variable differential calculus. If the function is differentiable at a, then the tangent line at (a, f(a)) to the graph y = f(x) in is the straight line whose equation is
The right-hand side of this equation is a linear function of x − a; we may regard
Figure 2.1
it as a linear approximation to the actual change f(x) − f(a) in the value of f between a and x. To make this more precise, let us write h = x − a, Δfa(h) = f(a + h) − f(a), and dfa(h) = f′(a)h (see Fig. 2.1). The linear mapping , defined by dfa(h) = f′(a)h, is called the differential of f at a; it is simply that linear mapping whose matrix is the derivative f′(a) of f at a (the matrix of a linear mapping being just a real number). With this terminology, we find that when h is small, the linear change dfa(h) is a good approximation to the actual change Δfa(h), in the sense that
Roughly speaking, our point of view in this chapter will be that a mapping is (by definition) differentiable at a if and only if it has near a an appropriate linear approximation . In this case dfa will be called the differential of f at a; its (m × n) matrix will be called the derivative of f at a, thus preserving the above relationship between the differential (a linear mapping) and the derivative (its matrix). We will see that this approach is geometrically well motivated, and permits the basic ingredients of differential calculus (for example, the chain rule, etc.) to be developed and utilized in a multivariable setting.
Chapter 1. CURVES IN R
We consider first the special case of a mapping . Motivated by curves in and , one may think of a curve in , traced out by a moving point whose position at time t is the point , and attempt to define its velocity at time t. Just as in the single-variable case, m = 1, this problem leads to the definition of the derivative f′ of f. The change in position of the particle from time a to time a + h is described by the vector f(a + h) − f(a), so the average velocity of the particle during this time interval is the familiar-looking difference quotient
whose limit (if it exists) as h → 0 should (by definition) be the instantaneous velocity at time a. So we define
if this limit exists, in which case we say that f is differentiable at . The derivative vector f′(a) of f at a may be visualized as a tangent vector to the image curve of f at the point f(a) (see Fig. 2.2); its length f′(a) is the speed at time t = a of the moving point f(t), so f′(a) is often called the velocity vector at time a.
Figure 2.2
If the derivative mapping is itself differentiable at a, its derivative at a is the second derivative f″ (a) of f at a. Still thinking of f in terms of the motion of a moving point (or particle) in , f″(a) is often called the acceleration vector at time a. Exercises 1.3 and 1.4 illustrate the usefulness of the concepts of velocity and acceleration for points moving in higher-dimensional Euclidean spaces.
By Theorem 7.1 of Chapter I (limits in may be taken coordinatewise), we see that is differentiable at a if and only if each of its coordinate functions f1, . . . , fm is differentiable at a, in which case
That is, the differentiable function may be differentiated coordinate-wise. Applying coordinatewise the familiar facts about derivatives of real-valued functions, we therefore obtain the results listed in the following theorem.
Theorem 1.1 Let f and g be mappings from to , and , all differentiable. Then
and
Formula (5) is the chain rule for the composition .
Notice the familiar pattern for the differentiation of a product in formulas (3) and (4). The proofs of these formulas are all the same—simply apply componentwise the corresponding formula for real-valued functions. For example, to prove (5), we write
applying componentwise the single-variable chain rule, which asserts that
if the functions f, are differentiable at g(t) and t respectively.
We see below (Exercise 1.12) that the mean value theorem does not hold for vector-valued functions. However it is true that two vector-valued functions differ only by a constant (vector) if they have the same derivative; we see this by componentwise application of this fact for real-valued functions.
The tangent line at f(a) to the image curve of the differentiable mapping is, by definition, that straight line which passes through f(a) and is parallel to the tangent vector f′(a). We now inquire as to how well this tangent line approximates the curve close to f(a). That is, how closely does the mapping h → f(a) + hf′(a) of into (whose image is the tangent line) approximate the mapping h → f(a + h)? Let us write
for the actual change in f from a to a + h, and
for the linear (as a function of h) change along the tangent line. Then Fig. 2.3 makes it clear that we are simply asking how small the difference vector
Figure 2.3
Δfa(h) − dfa(h) is when h is small. The answer is that it goes to zero even faster than h does. That is,
by the definition of f′(a). Noting that is a linear mapping, we have proved the “only if” part of the following theorem.
Theorem 1.2 The mapping is differentiable at if and only if there exists a linear mapping such that
in which case L is defined by L(h) = dfa(h) = hf′(a).
To prove the “if” part, suppose that there exists a linear mapping satisfying (6). Then there exists such that L is defined by L(h) = hb; we must show that f′(a) exists and equals b. But
by (6).
If is differentiable at a, then the linear mapping , defined by dfa(h) = hf′(a), is called the differential of f at a. Notice that the derivative vector f′(a) is, as a column vector, the matrix of the linear mapping dfa, since
When in the next section we define derivatives and differentials of mappings from to , this relationship between the two will be preserved—the differential will be a linear mapping whose matrix is the derivative.
The following discussion provides some motivation for the notation dfa for the differential of f at a. Let us consider the identity function → , and write x for its name as well as its value at x. Since its derivative is 1 everywhere, its differential at a is defined by
If f is real-valued, and we substitute h = dxa(h) into the definition of , we obtain
so the two linear mappings dfa and f′(a) dxa are equal,
If we now use the Leibniz notation f′(a) = df/dx and drop the subscript a, we obtain the famous formula
which now not only makes sense, but is true! It is an actual equality of linear mappings of the real line into itself.
Now let f and g be two differentiable functions from to , and write h = g f for the composition. Then the chain rule gives
so we see that the single-variable chain rule takes the form
In brief, the differential of the composition h = g f is the composition of the differentials of g and f. It is this elegant formulation of the chain rule that we will generalize in Section 3 to the multivariable case.
Exercises
1.1Let f : be a differentiable mapping with f′(t) ≠ 0 for all . Let p be a fixed point not on the image curve of f as in Fig. 2.4. If q = f(t0) is the point of the curve closest to p, that is, if for all , show that the vector p − q is orthogonal to the curve at q. Hint: Differentiate the function .
Figure 2.4
1.2(a)Let and be two differentiable curves, with f′(t) ≠ 0 and g′(t) ≠ 0 for all . Suppose the two points p = f(s0) and q = g(t0) are closer than any other pair of points on the two curves. Then prove that the vector p − q is orthogonal to both velocity vectors f′(s0) and g′(t0). Hint: The point (s0, t0) must be a critical point for the function defined by .
(b)Apply the result of (a) to find the closest pair of points on the “skew” straight lines in defined by f(s) = (s, 2s, −s) and g(t) = (t + 1, t − 2, 2t + 3).
1.3Let be a conservative force field on , meaning that there exists a continuously differentiable potential function such that F(x) = −V(x) for all [recall that V = (∂V/∂x1, . . . , ∂V/∂xn)]. Call the curve a “quasi-Newtonian particle” if and only if there exist constants m1, m2, . . . , mn, called its “mass components,” such that
for each i = 1, . . . , n. Thus, with respect to the xi-direction, it behaves as though it has mass mi. Define its kinetic energy K(t) and potential energy P(t) at time t by
Now prove that the law of the conservation of energy holds for quasi-Newtonian particles, that is, K + P = constant. Hint: Differentiate K(t) + P(t), using the chain rule in the form P′(t) = V((t)) · ′(t), which will be verified in Section 3.
1.4(n-body problem) Deduce from Exercise 1.3 the law of the conservation of energy for a system of n particles moving in (without colliding) under the influence of their mutual gravitational attractions. You may take n = 2 for brevity, although the method is general. Hint: Denote by m1and m2 the masses of the two particles, and by r1 = (x1, x2, x3) and r2 = (x4, x5, x6) their positions at time t. Let r12 = r1 − r2 be the distance between them. We then have a quasi-Newtonian particle in with mass components m1, m1, m1, m2, m2, m2 and force field F defined by
for . If
verify that F = −V. Then apply Exercise 1.3 to conclude that
Remark: In the general case of a system of n particles, the potential function would be
where rij = rj − rj.
1.5If is linear, prove that f′(a) exists for all , with dfa = f.
1.6If L1 and L2 are two linear mappings from ro n satisfying formula (6), prove that L1 = L2. Hint: Show first that
1.7Let f, both be differentiable at a.
(a)Show that d(fg)a = g(a) dfa + f(a) dga.
(b)Show that
1.8Let γ(t) be the position vector of a particle moving with constant acceleration vector γ″(t) = a. Then show that , where p0 = γ(0) and v0 = γ′(0). If a = 0, conclude that the particle moves along a straight line through p0 with velocity vector v0 (the law of inertia).
1.9Let γ: → n be a differentiable curve. Show that γ(t) is constant if and only if γ(t) and γ′(t) are orthogonal for all t.
1.10Suppose that a particle moves around a circle in the plane 2, of radius r centered at 0, with constant speed v. Deduce from the previous exercise that γ(t) and γ″(t) are both orthogonal to γ″(t), so it follows that γ″(t) = k(t)γ(t). Substitute this result into the equation obtained by differentiating γ(t) · γ′(t) = 0 to obtain k = −v2/r2. Thus the acceleration vector always points towards the origin and has constant length v2/r.
1.11Given a particle in 3 with mass m and position vector γ(t), its angular momentum vector is L(t) = γ(t) × mγ′(t), and its torque is T(t) = γ(t) × mγ″(t).
(a)Show that L′(t) = T(t), so the angular momentum is constant if the torque is zero (this is the law of the conservation of angular momentum).
(b)If the particle is moving in a central force field, that is, γ(t) and γ″(t) are always collinear, conclude from (a) that it remains in some fixed plane through the origin.
1.12Consider a particle which moves on a circular helix in 3 with position vector
(a)Show that the speed of the particle is constant.
(b)Show that its velocity vector makes a constant nonzero angle with the z-axis.
(c)If t1 = 0 and t2 = 2π/Π, notice that γ(t1) = (a, 0, 0) and γ(t2) = (a, 0, 2πb), so the vector γ(t2) − γ(t1) is vertical. Conclude that the equation
cannot hold for any . Thus the mean value theorem does not hold for vector-valued functions.