Advanced Calculus of Several Variables (1973)
Part III. Successive Approximations and Implicit Functions
Chapter 2. THE MULTIVARIABLE MEAN VALUE THEOREM
The mean value theorem for real-valued functions states that, if the open set contains the line segment L joining the points a and b, and f : U → is differentiable, then
for some point c ∈ L (Theorem II.3.4). We have seen (Exercise II.1.12) that this important result does not generalize to vector-valued functions. However, in many applications of the mean value theorem, all that is actually needed is the numerical estimate
which follows immediately from (1) and the Cauchy-Schwarz inequality (if f is so the maximum on the right exists). Fortunately inequality (2) does generalize to the case of mappings from n to m, and we will see that this result, the multivariable mean value theorem, plays a key role in the generalization to higher dimensions of the results of Section 1.
Recall from Section I.3 that a norm on the vector space V is a real-valued function x → x such that x > 0 if x ≠ 0, ax = a · x, and for all x,y ∈ V and a ∈ . Given a norm on V, by the ball of radius r with respect to this norm, centered at a ∈ V, is meant the set .
Thus far we have used mainly the Euclidean norm
on n. In this section we will find it more convenient to use the “sup norm”
which was introduced in Example 3 of Section I.3. The “unit ball” with respect to the sup norm is the cube , which is symmetric with respect to the coordinate planes in n, and has the point (1, 1, . . . , 1) as one of its vertices. The cube will be referred to as the “cube of radius r” centered at 0. We will delete the dimensional superscript when it is not needed for clarity.
We will see in Section VI. 1 that any two norms on n are equivalent, in the sense that every ball with respect to one of the norms contains a ball with respect to the other, centered at the same point. Of course this is “obvious” for the Euclidean norm and the sup norm (Fig. 3.5). Consequently it makes no difference which norm we use in the definitions of limits, continuity, etc. (Why ?)
Figure 3.5
We will also need the concept of the norm of a linear mapping L : n → m. The norm L of L is defined by
We will show presently that is indeed a norm on the vector space mn of all linear mappings from n to m—the only property of a norm that is not obvious is the triangle inequality.
We have seen (in Section I.7) that every linear mapping is continuous. This fact, together with the fact that the function x → x0 is clearly continuous on m, implies that the composite x → L(x)0 is continuous on the compact set ∂C1n, so the maximum value L exists. Note that, if , then , so
This is half of the following result, which provides an important interpretation of L.
Proposition 2.1 If L : n → m is a linear mapping, then L is the least number M such that for all .
PROOF It remains only to be shown that, if for all , then . But this follows immediately from the fact that the inequality reduces to , while L = max L(x)0 for .
In our proof of the mean value theorem we will need the elementary fact that the norm of a component function of the linear mapping L is no greater than the norm L of L itself.
Lemma 2.2 If L = (L1, . . . , Lm): n → m is linear, then for each i = 1, . . . , m.
PROOF Let x0 be the point of ∂C1 at which Li(x) is maximal. Then
Next we give a formula for actually computing the norm of a given linear mapping. For this we need a particular concept of the “norm” of a matrix. If A = (aij) is an m × n matrix, we define its norm A by
Note that, in terms of the “1-norm” defined on n by
A is simply the maximum of the 1-norms of the row vectors A1, . . . , Am of A,
To see that this is actually a norm on the vector space mn of all m × n matrices, let us identify mn with mn in the natural way:
In other words, if x1, . . . , xm are the row vectors of the m × n matrix X = (xij), we identify X with the point
With this notation, what we want to show is that
defines a norm on mn. But this follows easily from the fact that 1 is a norm on n (Exercise 2.2). In particular, satisfies the triangle inequality. A ball with respect to the 1-norm is pictured in Fig. 3.6 (for the case n = 2); a ball with respect to the above norm on mn is the Cartesian product of m such balls, one in each n factor.
We will now show that the norm of a linear mapping is equal to the norm of its matrix. For example, if L : 3 → 3 is defined by L(x, y, z) = (x − 3z, 2x − y − 2z, x + y), then L = max{4, 5, 2} = 5.
Figure 3.6
Theorem 2.3 Let A = (aij) be the matrix of the linear mapping L : n → m, that is, L(x) = Ax for all . Then
PROOF Given , the coordinates (y1, . . . , ym) of y = L(x) are defined by
Let yk be the largest of the absolute values of these coordinates of y. Then
Thus for all , so it follows from Proposition 2.1 that .
To prove that , it suffices to exhibit a point for which . Suppose that the kth row vector Ak = (ak1 . . . akn) is the one whose 1-norm is greatest, so
For each j = 1, . . . , n, define εj = ±1 by akj = εjakj. If x = (ε1, ε2, . . . , εn), then x0 = 1, and
so as desired.
Let Φ : mn → mn be the natural isomorphism from the vector space of all linear mappings n → m to the vector space of all m × n matrices, Φ(L) being the matrix of . Then Theorem 2.3 says simply that the isomorphism Φ is “norm-preserving.” Since we have seen that on mn satisfies the triangle inequality, it follows easily that the same is true of on mn. Thus is indeed a norm on mn.
Henceforth we will identify both the linear mapping space mn and the matrix space mn with Euclidean space mn, by identifying each linear mapping with its matrix, and each m × n matrix with a point of mn (as above). In other words, we can regard either symbol mn as mn as denoting mn with the norm
where .
If f : n → m is a differentiable mapping, then , and , so we may regard f′ as a mapping form n to mn,
and similarly df as a mapping from n to mn. Recall that f : n → m is at if and only if the first partial derivatives of the component functions of f all exist in a neighborhood of a and are continuous at a. The following result is an immediate consequence of this definition.
Proposition 2.4 The differentiable mapping f : n → m is at if and only if f′ : n → mn is continuous at a.
We are finally ready for the mean value theorem.
Theorem 2.5 Let f : U → m be a mapping, where is a neighborhood of the line segment L with endpoints a and b. Then
PROOF Let h = b − a, and define the curve γ : [0, 1] → m by
If f1, . . . , fm are the component functions of f, then γi(t) = fi(a + th) is the ith component function of γ, and
by the chain rule.
If the maximal (in absolute value) coordinate of f(b) − f(a) is the kth one, then
as desired.
If U is a convex open set (that is, each line segment joining two points of U lies in U), and f : U → m is a mapping such that for each x ∈ U, then the mean value theorem says that
if . Speaking very roughly, this says that f(a + h) is approximately equal to the constant f(a) when h0 is very small. The following important corollary to the mean value theorem says (with λ = dfa) that the actual difference Δfa(h) = f(a + h) − f(a) is approximately equal to the linear difference dfa(h) when h is very small.
Corollary 2.6 Let f : U → m be a mapping, where is a neighborhood of the line L with endpoints a and a + h. If λ : n → m is a linear mapping, then
PROOF Apply the mean value theorem to the mapping g : U → m defined by g(x) = f(x) − λ(x), noting that dfx = dfx − λ because dλx = λ (by Example 3 of Section II.2), and that
because λ is linear.
As a typical application of Corollary 2.6, we can prove in the case m = n that, if U contains the cube Cr of radius r centered at 0, and dfx is close (in norm) to the identity mapping I : n → n for all , then the image under fof the cube Cr is contained in a slightly larger cube (Fig. 3.7). This seems natural enough—if df is sufficiently close to the identity, then f should be also, so no point should be moved very far.
Figure 3.7
Corollary 2.7 Let U be an open set in n containing the cube Cr, and f : U → n a mapping such that f(0) = 0 and df0 = I. If
for all , then
PROOF Applying Corollary 2.6 with a = 0, λ = df0 = I, and , we obtain
But by the triangle inequality, so it follows that
as desired.
The following corollary is a somewhat deeper application of the mean value theorem. At the same time it illustrates a general phenomenon which is basic to the linear approximation approach to calculus—the fact that simple properties of the differential of a function often reflect deep properties of the function itself. The point here is that the question as to whether a linear mapping is one-to-one, is a rather simple matter, while for an arbitrary given mapping this may be a quite complicated question.
Corollary 2.8 Let f : n → m be at a. If dfa : n → m is one-to-one, then f itself is one-to-one on some neighborhood of a.
PROOF Let m be the minimum value of dfa(x)0 for ; then m > 0 because dfa is one-to-one [otherwise there would be a point x ≠ 0 with dfa(x) = 0]. Choose a positive number ε < m.
Since f is at a, there exists δ > 0 such that
If x and y are any two distinct points of the neighborhood
then an application of Corollary 2.6, with λ = dfa and L the line segment from x to y, yields
The triangle inequality then gives
so
Thus f(x) ≠ f(y) if x ≠ y.
see Corollary 2.8 has the interesting consequence that, if f : n → n is with dfa one-to-one (so f is 1 − 1 in a neighborhood of a), and if f is “slightly perturbed” by means of the addition of a “small” term g : n → n, then the new mapping h = f + g is still one-to-one in a neighborhood of a. Exercise 2.9 for a precise statement of this result.
In this section we have dealt with the mean value theorem and its corollaries in terms of the sup norms on n and m, and the resulting norm
This will suffice for our purposes. However arbitrary norms m on m and n on n can be used in the mean value theorem, provided we use the norm
on mn, where Dn is the unit ball in n with respect to the norm n. The conclusion of the mean value theorem is then the expected inequality
In Exercises 2.5 and 2.6 we outline an alternative proof of the mean value theorem which establishes it in this generality.
Exercises
2.1Let m and n be norms on m and n respectively. Prove that
defines a norm on m+n. Similarly prove that (x, y)1 = xm + yn defines a norm on m+n.
2.2Show that , as defined by Eq. (3), is a norm on the space mn of m × n matrices.
2.3Given , denote by La the linear function
Consider the norms of La with respect to the sup norm 0 and the 1-norm 1 on n, defined as in the last paragraph of this section. Show that La1 = a0 while La0 = a1.
2.4Let L : n → m be a linear mapping with matrix (aij). If we use the 1-norm on n and the sup norm on m, show that the corresponding norm on mn is
that is, the sup norm on mn.
2.5Let γ : [a, b] → m be a mapping with for all , being an arbitrary norm on m. Prove that
Outline: Given ε > 0, denote by Sε the set of points such that
for all . Let c = lub Sε. If c < b, then there exists δ > 0 such that
Conclude from this that , a contradiction. Therefore c = b, so
for all ε > 0.
2.6Apply the previous exercise to establish the mean value theorem with respect to arbitrary norms on n and m. In particular, given f : U → m where U is a neighborhood in n of the line segment L from a to a + h, apply Exercise 2.5 with γ(t) = f(a + th).
2.7(a) Show that the linear mapping T : n → m is one-to-one if and only if is positive.
(b)Conclude that the linear mapping T : n → m is one-to-one if and only if there exists a > 0 such that for all .
2.8Let T: n → m be a one-to-one linear mapping with for all , where a > 0. If , show that for all , so S is also one-to-one. Thus the set of all one-to-one linear mappings n → m forms an open subset of mn ≈ mn.
2.9Apply Corollary 2.8 and the preceding exercise to prove the following. Let f : n → n be a mapping such that dfa : n → n is one-to-one, so that f is one-to-one in a neighborhood of a. Then there exists ε > 0 such that if g : n → n is a mapping with g(a) = 0 and dga < ε, then the mapping h : n → n, defined by h(x) = f(x) + g(x), is also one-to-one in a neighborhood of a.