Advanced Calculus of Several Variables (1973)
Part III. Successive Approximations and Implicit Functions
Chapter 3. THE INVERSE AND IMPLICIT MAPPING THEOREMS
The simplest cases of the inverse and implicit mapping theorems were discussed in Section 1. Theorem 1.3 dealt with the problem of solving an equation of the form f(x) = y for x as a function of y, while Theorem 1.4 dealt with the problem of solving an equation of the form G(x, y) = 0 for y as a function of x. In each case we defined a sequence of successive approximations which, under appropriate conditions, converged to a solution.
In this section we establish the analogous higher-dimensional results. Both the statements of the theorems and their proofs will be direct generalizations of those in Section 1. In particular we will employ the method of successive approximations by means of the contraction mapping theorem.
The definition of a contraction mapping in n is the same as on the line. Given a subset C of n, the mapping φ : C → C is called a contraction mapping with contraction constant k if
for all . The contraction mapping theorem asserts that, if the set C is closed and bounded, and k < 1, then φ has a unique fixed point such that φ(x*) = x*.
Theorem 3.1 Let φ : C → C be a contraction mapping with contraction constant k < 1, and with C being a closed and bounded subset of n. Then φ has a unique fixed point x*. Moreover, given , the sequence defined inductively by
converges to x*. In particular,
The proof given in see Section 1 for the case n = 1 generalizes immediately, with no essential change in the details. We leave it to the reader to check that the only property of the closed interval , that was used in the proof of Theorem 1.1, is the fact that every Cauchy sequence of points of [a, b] converges to a point of [a, b]. But every closed and bounded set has this property (the Appendix).
The inverse mapping theorem asserts that the mapping f : n → n is locally invertible in a neighborhood of the point if its differential dfa : n → n at a is invertible. This means that, if the linear mapping dfa : n→ n is one-to-one (and hence onto), then there exists a neighborhood U of a which f maps one-to-one onto some neighborhood V of b = f(a), with the inverse mapping g : V → U also being . Equivalently, if the linear equations
have a unique solution for each , then there exist neighborhoods U of a and V of b = f(a), such that the equations
have a unique solution for each . Here we are writing f1, . . . , fn for the component functions of f : n → n.
It is easy to see that the invertibility of dfa is a necessary condition for the local invertibility of f near a. For if U, V, and g are as above, then the compositions g f and f g are equal to the identity mapping on U and V, respectively. Consequently the chain rule implies that
This obviously means that dfa is invertible with . Equivalently, the derivative matrix f′(a) must be invertible with f′(a)−1 = g′(b). But the matrix f′(a) is invertible if and only if its determinant is nonzero. So a necessary condition for the local invertibility of f near f′(a) is that f′(a) ≠ 0.
The following example shows that local invertibility is the most that can be hoped for. That is, f may be on the open set G with f′(a) ≠ 0 for each a ∈ G, without f being one-to-one on G.
Example 1 Consider the mapping f : 2 → 2 defined by
Since cos2 θ − sin2 θ = cos 2θ and 2 sin θ cos θ = sin 2θ, we see that in polar coordinates f is described by
From this it follows that f maps the circle of radius r twice around the circle of radius r2. In particular, f maps both of the points (r cos θ, r sin θ) and (r cos (θ + π), r sin(θ + π)) to the same point (r2 cos 2θ, r2 sin 2θ). Thus f maps the open set 2 − 0 “two-to-one” onto itself. However, f′(x, y) = 4(x2 + y2), so f′(x, y) is invertible at each point of 2 − 0.
We now begin with the proof of the inverse mapping theorem. It will be convenient for us to start with the special case in which the point a is the origin, with f(0) = 0 and f′(0) = I (the n × n identity matrix). This is the following substantial lemma; it contains some additional information that will be useful in Chapter IV (in connection with changes of variables in multiple integrals). Given ε > 0, note that, since f is with df0 = I, there exists r > 0 such that dfx − I < ε for all points (the cube of radius r centered at Cr).
Lemma 3.2 Let f : n → n be a mapping such that f(0) = 0 and df0 = I. Suppose also that
for all . Then
Moreover, if V = int C(1 − ε)r and , then f : U → V is a one-to-one onto mapping, and the inverse mapping g : V → U is differentiable at 0.
Finally, the local inverse mapping g : V → U is the limit of the sequence of successive approximations defined inductively on V by
for y ∈ V.
PROOF We have already shown that (Corollary 2.7), and it follows from the proof of Corollary 2.8 that f is one-to-one on Cr—the cube Cr satisfies the conditions specified in the proof of Corollary 2.7. Alternatively, we can apply Corollary 2.6 with λ = df0 = I to see that
if . From this inequality it follows that
The left-hand inequality shows that f is one-to-one on Cr, while the right-hand one (with y = 0) shows that .
So it remains to show that f(Cr) contains the smaller cube C(1 − ε)r. We will apply the contraction mapping theorem to prove this. Given , define φ : n → n by
We want to show that φ is a contraction mapping of Cr; its unique fixed point will then be the desired point such that f(x) = y.
To see that φ maps Cr into itself, we apply Corollary 2.6:
so if , then also. Note here that, if , then φ(x)0 < r, so .
To see that φ : Cr → Cr is a contraction mapping, we need only note that
by (1).
Thus φ : Cr → Cr is indeed a contraction mapping, with contraction constant ε < 1, and therefore has a unique fixed point x such that f(x) = y. We have noted that φ maps Cr into int Cr if . Hence in this case the fixed point x lies in int Cr. Therefore, if , then U and V are open neighborhoods of 0 such that f maps U one-to-one onto V.
The fact that the fixed point x = g(y) is the limit of the sequence defined inductively by
follows immediately from the contraction mapping theorem (3.1).
So it remains only to show that g : V → U is differentiable at 0, where g(0) = 0. It suffices to show that
this will prove that g is differentiable at 0 with dg0 = I. To verify (3), we apply (1) with y = 0, x = g(h), h = f(x), obtaining
We then apply the left-hand inequality of (2) with y = 0, obtaining
This follows from the fact that
Since f is at 0 with df0 = I, we can make ε > 0 as small as we like, simply by restricting our attention to a sufficiently small (new) cube centered at 0. Hence (4) implies (3).
We now apply this lemma to establish the general inverse mapping theorem. It provides both the existence of a local inverse g under the condition f′(a) ≠ 0, and also an explicit sequence of successive approximations to g. The definition of this sequence can be motivated precisely as in the 1-dimensional case (preceding the statement of Theorem 1.3).
Theorem 3.3 Suppose that the mapping f : n → n is in a neighborhood W of the point a, with the matrix f′(a) being nonsingular. Then f is locally invertible—there exist neighborhoods of a and V of b = f(a), and a one-to-one mapping g : V → W such that
and
In particular, the local inverse g is the limit of the sequence of successive approximations defined inductively by
for y ∈ V.
PROOF We first “alter” the mapping f so as to make it satisfy the hypotheses of Lemma 3.2. Let τa and τb be the translations of n defined by
and let T = dfa : n → n. Then define : n → n by
The relationship between f and is exhibited by the following diagram:
The assertion of Eq. (6) is that the same result is obtained by following the arrows around in either direction.
Note the (0) = 0. Since the differentials of τa and τb are both the identity mapping I of n, an application of the chain rule yields
Since is in a neighborhood of 0, we have
on a sufficiently small cube centered at 0. Therefore Lemma 3.2 applies to give neighborhoods and of 0, and a one-to-one mapping of onto , differentiable at 0, such that the mappings
are inverses of each other. Moreover Lemma 3.2 gives a sequence of successive approximations to , defined inductively by
for .
We let U = τa T−1(), V = τb(), and define g : V → U by
(Now look at g and in the above diagram.) The facts, that is a local inverse to , and that the mappings τa T−1 and τb are one-to-one, imply that g : V → U is the desired local inverse to f : U → V. The fact that is differentiable at 0 implies that g is differentiable at b = τb(0).
We obtain the sequence of successive approximations to g from the sequence of successive approximations to , by defining
for y ∈ V (replacing g by gk and by k in the above diagram).
To verify that the sequence may be defined inductively as in (5), note first that
Now start with the inductive relation
Substituting from (6) and (7), we obtain
Applying τa T−1 to both sides of this equation, we obtain the desired inductive relation
It remains only to show that g : V → U is a mapping; at the moment we know only that g is differentiable at the point b = f(a). However what we have already proved can be applied at each point of U, so it follows that g is differentiable at each point of V.
To see that g is continuously ifferentiable on V, we note that, since
the chain rule gives
for each y ∈ V. Now f′(g(y)) is a continuous (matrix-valued) mapping, because f is by hypothesis, and the mapping g is continuous because it is differentiable (Exercise II.2.1). In addition it is clear from the formula for the inverse of a non-singular matrix (Theorem I.6.3) hat the entries of an inverse matrix A−1 are continuous functions of those of A. These facts imply that the entries of the matrix g′(y) = [f′(g(y))]−1 are continuous functions of y, so g is .
This last argument can be rephrased as follows. We can write g′: V → nn, where nn is the space of n × n matrices, as the composition
where (A) = A−1 on the set of invertible n × n matrices (an open subset of nn). is continuous by the above remark, and f′: U → nn is continuous because f s , so we have expressed g′ as a composition of continuous mappings. Thus g′ is continuous, so g is by Proposition 2.4.
The power of the inverse mapping theorem stems partly from the fact that the condition det f′(a) ≠ 0 implies the invertibility of the mapping f in a neighborhood of a, even when it is difficult or impossible to find the local inverse mapping g explicitly. However Eqs. (5) enable us to approximate g rbitrarily closely [near b = f(a)].
Example 2 Suppose the mapping is defined by the equations
Let a = (1, −2), so b = f(a) = (2, −1). The derivative matrix is
so f′(a) is the identity matrix with determinant 1. Therefore f is invertible near a = (1, −2). That is, the above equations can be solved for u and v as functions of x and y, (u, v) = g(x, y), if the point (x, y) is sufficiently close to b = (2, −1). According to Eqs. (5), the sequence of successive approximations is defined inductively by
Writing gk(x, y) = (uk, vk), we have
The first several approximations are
It appears that we are generating Taylor polynomials for the component functions of g in powers of (x − 2) and (y + 1). This is true, but we will not verify it.
The inverse mapping theorem tells when we can, in principle, solve the equation x = f(y), or equivalently x − f(y) = 0, for y as a function of x. The implicit mapping theorem deals with the problem of solving the general equation G(x, y) = 0, for y as a function of x. Although the latter equation may appear considerably more general, we will find that its solution reduces quite easily to the special case considered in the inverse mapping theorem.
In order for us to reasonably expect a unique solution of G(x, y) = 0, there should be the same number of equations as unknowns. So if , then G should be a mapping from m + n to n. Writing out components of the vector equation G(x, y) = 0, we obtain
We want to discuss the solution of this system of equations for y1, . . . , yn in terms of x1, . . . , xm.
In order to investigate this problem we need the notion of partial differential linear mappings. Given a mapping G : m + n → k wihch is differentiable at the point , the partial differentials of G are the linear mappings dx Gp : m → k and dy Gp: n → k defined by
respectively. The matrices of these partial differential linear mappings are called the partial derivatives of G with respect to x and y, respectively. These partial derivative matrices are denoted by
respectively. It should be clear from the definitions that D1G(a, b) consists of the first m columns of G′(a, b), while D2 G(a, b) consists of the last n columns. Thus
Suppose now that and are both differentiable functions of and y = β(t), and define φ : 1 → k by
Then a routine application of the chain rule gives the expected result
or
in more detail. We leave this computation to the exercises (Exercise 3.16). Note that D1G is a k × m matrix and α′ is an m × l matrix, while D2 G is a k × n matrix and β′ is an n × l matrix, so Eq. (8) at least makes sense.
Suppose now that the function G : m + n → n is differentiable in a neighborhood of the point where G(a, b) = 0. Suppose also that the equation G(x, y) = 0 implicitly defines a differentiable function y = h(x) for x near a, that is, h is a differentiable mapping of a neighborhood of a into n, with
We can then compute the derivative h′(x) by differentiating the equation G(x, h(x)) = 0. Applying Eq. (8) above with x = t, α(x) = x, we obtain
If the n × n matrix D2 G(x, y) is nonsingular, it follows that
In particular, it appears that the nonvanishing of the so-called Jacobian determinant
at (a, b) is a necessary condition for us to be able to solve for h′(a). The implicit mapping theorem asserts that, if G is in a neighborhood of (a, b), then this condition is sufficient for the existence of the implicitly defined mapping h.
Given G : m + n → n and h : U → n, where , we will say that y = h(x) solves the equation G(x, y)) = 0 in a neighborhood W of (a, b) if the graph of f agrees in W with the zero set of G. That is, if (x,y)∈ W and x ∈ U, then
Note the almost verbatim analogy between the following general implicit mapping theorem and the case m = n = 1 considered in Section 1.
Theorem 3.4 Let the mapping G : m + n → n be in a neighborhood of the point (a, b) where G(a, b) = 0. If the partial derivative matrix D2 G(a, b) is nonsingular, then there exists a neighborhood U of a in m, a neighborhood W of (a, b) in m + n, and a mapping h : U → n, such that y = h(x) solves the equation G(x, y) = 0 in W.
In particular, the implicitly defined mapping h is the limit of the sequence of successive approximations defined inductively by
for x ∈ U.
PROOF We want to apply the inverse mapping theorem to the mapping f : m + n → m + n defined by
for which f(x, y) = (x, 0) if and only if G(x, y) = 0. Note that f is in a neighborhood of the point (a, b) where f(a, b) = (a, 0). In order to apply the inverse mapping theorem in a neighborhood of (a, b), we must first show that the matrix f′(a, b) is nonsingular.
It is clear that
where I denotes the m × m identity matrix, and 0 denotes the m × n zero matrix. Consequently
if . In order to prove that f′(a, b) is nonsingular, it suffices to show that df(a, b) is one-to-one (why?), that is, that df(a, b)(r, s) = (0, 0) implies (r, s) = (0, 0). But this follows immediately from the above expression for df(a, b)(r, s) and the hypothesis that D2 G(a, b) is nonsingular, so dy G(a, b)(s) = 0 implies s = 0.
We can therefore apply the inverse mapping theorem to obtain neighborhoods W of (a, b) and V of (a, 0), and a inverse g : V → W of f : W → V, such that g(a, 0) = (a, b). Let U be the neighborhood of defined by
identifying m with (see Fig. 3.8).
Figure 3.8
Since f(x, y) = 0 if and only if G(x, y) = 0, it is clear that g maps the set one-to-one onto the intersection with W of the zero set G(x, y) = 0. If we now define the mapping h : U → n by
it follows that y = h(x) solves the equation G(x, y) = 0 in W.
It remains only to define the sequence of successive approximations to h. The inverse mapping theorem provides us with a sequence of successive approximations to g, defined inductively by
We define hk : U → n by
Then the fact, that the sequence converges to g on V, implies that the sequence converges to h on U.
Since g0(x, y) ≡ (a, b), we see that h0(x) ≡ b. Finally
so
Taking second components of this equation, we obtain
as desired.
Example 3 Suppose we want to solve the equations
for y and z as functions of x in a neighborhood of (− 1, 1, 0). We define G : 3 → 2 by
Since
the implicit mapping theorem assures us that the above equations do implicitly define y and z as functions of x, for x near − 1. The successive approximations for h(x) = (y(x), z(x)) begin as follows:
Example 4 Let G : 2 × 2 = 4 → 2 be defined by
where x = (x1, x2) and y = (y1, y2), and note that G(1, 0, 0, 1) = (0, 0). Since
is nonsingular, the equation G(x, y) = 0 defines y implicitly as a function of x, y = h(x) for x near (1, 0). Suppose we only want to compute the derivative matrix h′(1, 0). Then from Eq. (9) we obtain
Note that, writing Eq. (9) for this example with Jacobian determinants instead of derivative matrices, we obtain the chain rule type formula
or
The partial derivatives of an implicitly defined function can be calculated using the chain rule and Cramer's rule for solving linear equations. The following example illustrates this procedure.
Example 5 Let G1, G2 : 5 → be functions, with G1(a) = G2(a) = 0 and
at the point . Then the two equations
implicitly define u and v as functions of x, y, z:
Upon differentiation of the equations
with respect to x, we obtain
Solving these two linear equations for ∂f/∂x and ∂g/∂x, we obtain
Similar formulas hold for ∂f/∂y, ∂g/∂y, ∂f/∂z, ∂g/∂z (see Exercise 3.18).
Exercises
3.1Show that f(x, y) = (x/(x2 + y2), y/(x2 + y2)) is locally invertible in a neighborhood of every point except the origin. Compute f−1 explicitly.
3.2Show that the following mappings from 2 to itself are everywhere locally invertible.
(a)f(x, y) = (ex + ey, ex − ey).
(b)g(x, y) = (ex cos y, ex sin y).
3.3Consider the mapping f : 3 → 3 defined by f(x, y, z) = (x, y3, z5). Note that f has a (global) inverse g, despite the fact that the matrix f′(0) is singular. What does this imply about the differentiability of g at 0?
3.4Show that the mapping , defined by u = x + ey, v = y + ez, w = z + ex, is everywhere locally invertible.
3.5Show that the equations
implicitly define z near − 1, as a function of (x, y) near (1, 1).
3.6Can the surface whose equation is
be represented in the form z = f(x, y) near (0, 2, 1)?
3.7Decide whether it is possible to solve the equations
for (u, v) near (1, 1) as a function of (x, y, z) near (1, 1, 1).
3.8The point (1, −1, 1) lies on the surfaces
Show that, in a neighborhood of this point, the curve of intersection of the surfaces can be described by a pair of equations of the form y = f(x), z = g(x).
3.9Determine an approximate solution of the equation
for z near 2, as a function of (x, y) near (1, −1).
3.10If the equations f(x, y, z) = 0, g(x, y, z) = 0 can be solved for y and z as differentiable functions of x, show that
where J = ∂(f, g)/∂(y, z).
3.11If the equations f(x, y, u, v) = 0, g(x, y, u, v) = 0 can be solved for u and v as differentiable functions of x and y, compute their first partial derivatives.
3.12Suppose that the equation f(x, y, z) = 0 can be solved for each of the three variables x, y, z as a differentiable function of the other two. Then prove that
Verify this in the case of the ideal gas equation pv = RT (where p, v, T are the variables and R is a constant).
3.13Let and be inverse functions. Show that
where J = ∂(f1, f2)/∂(x1, x2).
3.14Let and be inverse functions. Show that
and obtain similar formulas for the other derivatives of component functions of g.
3.15Verify the statement of the implicit mapping theorem given in Section II.5.
3.16Verify Eq. (8) in this section.
3.17Suppose that the pressure p, volume v, temperature T, and internal energy u of a gas satisfy the equations
and that these two equations can be solved for any two of the four variables as functions of the other two. Then the symbol ∂u/∂T, for example, is ambiguous. We denote by (∂u/∂T)p the partial derivative of u with respect to T, with u and v considered as functions of p and T, and by (∂u/∂T)vthe partial derivative of u, with u and p considered as functions of v and T. With this notation, apply the results of Exercise 3.11 to show that
3.18If y1, . . . , yn are implicitly defined as differentiable functions of x1, . . . , xm by the equations
show, by generalizing the method of Example 5, that
where