Advanced Calculus of Several Variables (1973)
Part I. Euclidean Space and Linear Mappings
Chapter 3. INNER PRODUCTS AND ORTHOGONALITY
In order to obtain the full geometric structure of n (including the concepts of distance, angles, and orthogonality), we must supply n with an inner product. An inner (scalar) product on the vector space V is a function V × V → , which associates with each pair (x, y) of vectors in V a real number x, y, and satisfies the following three conditions:
The third of these conditions is linearity in the first variable; symmetry then gives linearity in the second variable also. Thus an inner product on V is simply a positive, symmetric, bilinear function on V × V. Note that SP3 implies that 0, 0 = 0 (see Exercise 3.1).
The usual inner product on n is denoted by x · y and is defined by
where x = (x1, . . . , xn), y = (y1, . . . , yn). It should be clear that this definition satisfies conditions (SP1, SP2, SP3 above. There are many inner products on n see Example 2 below), but we shall use only the usual one.
Example 1Denote by the vector space of all continuous functions on the interval [a, b], and define
for any pair of functions It is obvious that this definition satisfies conditions SP2 and SP3. It also satisfies SP1, because if f(t0) ≠ 0, then by continuity (f(t))2 > 0 for all t in some neighborhood of t0, so
Therefore we have an inner product on .
Example 2Let a, b, c be real numbers with a > 0, ac − b2 > 0, so that the quadratic form q(x) = ax12 + 2bx1x2 + cx22 is positive-definite (see Section II.4). Then x, y = ax1y1 + bx1y2 + bx2y1 + cx2y2 defines an inner product on 2 (why?). With a = c = 1, b = 0 we obtain the usual inner product on 2.
An inner product on the vector space V yields a notion of the length or “size” of a vector , called its norm x. In general, a norm on the vector space V is a real-valued function x → x on V satisfying the following conditions:
for all and . Note that N2 implies that 0 = 0.
The norm associated with the inner product , on V is defined by
It is clear that SP1–SP3 and this definition imply conditions N1 and N2, but the triangle inequality is not so obvious; it will be verified below.
The most commonly used norm on n is the Euclidean norm
which comes in the above way from the usual inner product on n. Other norms on n, not necessarily associated with inner products, are occasionally employed, but henceforth x will denote the Euclidean norm unless otherwise specified.
Example 3x = max {x1, . . . , xn}, the maximum of the absolute values of the coordinates of x, defines a norm on n (see Exercise 3.2).
Example 4x1 = x1 + x2 + · · · + xn defines still another norm on n (again see Exercise 3.2).
A norm on V provides a definition of the distance d(x, y) between any two points x and y of V:
Note that a distance function d defined in this way satisfies the following three conditions:
for any three points x, y, z. Conditions D1 and D2 follow immediately from N1 and N2, respectively, while
by N3. Figure 1.1 indicates why N3 (or D3) is referred to as the triangle inequality.
Figure 1.1
The distance function that comes in this way from the Euclidean norm is the familiar Euclidean distance function
Thus far we have seen that an inner product on the vector space V yields a norm on V, which in turn yields a distance function on V, except that we have not yet verified that the norm associated with a given inner product does indeed satisfy the triangle inequality. The triangle inequality will follow from the Cauchy–Schwarz inequality of the following theorem.
Theorem 3.1If , is an inner product on a vector space V, then
for all [where the norm is the one defined by (2)].
PROOFThe inequality is trivial if either x or y is zero, so assume neither is. If u = x/x and v = y/y, then u = v = 1. Hence
So that is or
Replacing x by −x, we obtain
also, so the inequality follows.
The Cauchy–Schwarz inequality is of fundamental importance. With the usual inner product in n, it takes the form
while in , with the inner product of Example 1, it becomes
PROOFOF THE TRIANGLE INEQUALITY Given note that
which implies that
Notice that, if x, y = 0, in which case x and y are perpendicular (see the definition below), then the second equality in the above proof gives
This is the famous theorem associated with the name of Pythagoras (Fig. 1.2).
Figure 1.2
Recalling the formula x · y = x y cos θ for the usual inner product in 2, we are motivated to define the angle (x, y) between the vectors by
Notice that this makes sense because by the Cauchy–Schwarz inequality. In particular we say that x and y are orthogonal (or perpendicular) if and only if x · y = 0, because then (x, y) = arccos π/2 = 0.
A set of nonzero vectors v1, v2, . . . in V is said to be an orthogonal set if
whenever i ≠ j. If in addition each vi is a unit vector, vi, vi = 1, then the set is said to be orthonormal.
Example 5The standard basis vectors e1, . . . , en form an orthonormal set in n.
Example 6The (infinite) set of functions
is orthogonal in (see Example 1 and Exercise 3.11). This fact is the basis for the theory of Fourier series.
The most important property of orthogonal sets is given by the following result.
Theorem 3.2Every finite orthogonal set of nonzero vectors is linearly independent.
PROOFSuppose that
Taking the inner product with vi, we obtain
because vi, vi = 0 for i ≠ j if the vectors v1, . . . , vk are orthogonal. But vi, vi ≠ 0, so ai = 0. Thus (3) implies a1 = · · · = ak = 0, so the orthogonal vectors v1, . . . , vk are linearly independent.
We now describe the important Gram–Schmidt orthogonalization process for constructing orthogonal bases. It is motivated by the following elementary construction. Given two linearly independent vectors v and w1, we want to find a nonzero vector w2 that lies in the subspace spanned by v and w1, and is orthogonal to w1. Figure 1.3 suggests that such a vector w2 can be obtained by subtracting from v an appropriate multiple cw1 of w1. To determine c,
Figure 1.3
we simply solve the equation w1, v − cw1 = 0 for c = v, w1/w1, w1. The desired vector is therefore
obtained by subtracting from v the “component of v parallel to w1.” We immediately verify that w2, w1 = 0, while w2 ≠ 0 because v and w1 are linearly independent.
Theorem 3.3If V is a finite-dimensional vector space with an inner product, then V has an orthogonal basis.
In particular, every subspace of n has an orthogonal basis.
PROOFWe start with an arbitrary basis v1, . . . , vn for V. Let w1 = v1. Then, by the preceding construction, the nonzero vector
is orthogonal to w1 and lies in the subspace generated by v1 and v2.
Suppose inductively that we have found an orthogonal basis w1, . . . , wk for the subspace of V that is generated by v1, . . . , vk. The idea is then to obtain wk+1 by subtracting from vk+1 its components parallel to each of the vectors w1, . . . , wk. That is, define
where ci = vk+1, wi/wi, wi. Then wk+1, wi = vk+1, wi − ciwi, wi = 0 for , and wk+1 ≠ 0, because otherwise vk+1 would be a linear combination of the vectors w1, . . . , wk, and therefore of the vectors v1, . . . , vk. It follows that the vectors w1, . . . , wk+1 form an orthogonal basis for the subspace of V that is generated by v1, . . . , vk+1.
After a finite number of such steps we obtain the desired orthogonal basis w1, . . . , wn for V.
It is the method of proof of Theorem 3.3 that is known as the Gram–Schmidt orthogonalization process, summarized by the equations
defining the orthogonal basis w1, . . . , wn in terms of the original basis v1, . . . , vn.
Example 7To find an orthogonal basis for the subspace V of 4 spanned by the vectors v1 = (1, 1, 0, 0), v2 = (1, 0, 1, 0), v3 = (0, 1, 0, 1), we write
Example 8Let denote the vector space of polynomials in x, with inner product defined by
By applying the Gram–Schmidt orthogonalization process to the linearly independent elements 1, x, x2, . . . , xn, . . . , one obtains an infinite sequence , the first five elements of which are (see Exercise 3.12). Upon multiplying the polynomials {pn(x)} by appropriate constants, one obtains the famous Legendre polynomials etc.
One reason for the importance of orthogonal bases is the ease with which a vector can be expressed as a linear combination of orthogonal basis vectors w1, . . . , wn for V. Writing
and taking the inner product with wi, we immediately obtain
so
This is especially simple if w1, . . . , wn is an orthonormal basis for V:
Of course orthonormal basis vectors are easily obtained from orthogonal ones, simply by dividing by their lengths. In this case the coefficient v · wi of wi in (5) is sometimes called the Fourier coefficient of v with respect to wi. This terminology is motivated by an analogy with Fourier series. The orthonormal functions in corresponding to the orthogonal functions of Example 6 are
Writing
one defines the Fourier coefficients of by
and
It can then be established, under appropriate conditions on f, that the infinite series
converges to f(x). This infinite series may be regarded as an infinite-dimensional analog of (5).
Given a subspace V of n, denote by V the set of all those vectors in n, each of which is orthogonal to every vector in V. Then it is easy to show that V is a subspace of n, called the orthogonal complement of V (Exercise 3.3). The significant fact about this situation is that the dimensions add up as they should.
Theorem 3.4If V is a subspace of n, then
PROOF By Theorem 3.3, there exists an orthonormal basis v1, . . . , vr for V, and an orthonormal basis w1, . . . , ws for V. Then the vectors v1, . . . , vr, w1, . . . , ws are orthornormal, and therefore linearly independent. So in order to conclude from Theorem 2.5 that r + s = n as desired, it suffices to show that these vectors generate n. Given , define
Then y · vi = x · vi − (x · vi)(vi · vi) = 0 for each i = 1, . . . , r. Since y is orthogonal to each element of a basis for V, it follows easily that (Exercise 3.4). Therefore Eq. (5) above gives
This and (7) then yield
so the vectors v1, . . . , vr, w1, . . . , ws constitute a basis for n.
Example 9Consider the system
of homogeneous linear equations in x1, . . . , xn. If ai = (ai1, . . . , ain), i = 1, . . . , k, then these equations can be rewritten as
Therefore the set S of all solutions of (8) is simply the set of all those vectors that are orthogonal to the vectors a1, . . . , ak. If V is the subspace of n generated by a1, . . . , ak, it follows that S = V (Exercise 3.4). If the vectors a1, . . . , ak are linearly independent, we can then conclude from Theorem 3.4 that dim S = n − k.
Exercises
3.1Conclude from condition SP3 that 0, 0 = 0.
3.2Verify that the functions defined in Examples 3 and 4 are norms on n.
3.3If V is a subspace of n, prove that V is also a subspace.
3.4If the vectors a1, . . . , ak generate the subspace V of n, and is orthogonal to each of these vectors, show that .
3.5Verify the “polarization identity”
3.6Let a1, a2, . . . , an be an orthonormal basis for n. If x = s1a1 + · · · + snan and y = t1a1 + · · · + tnan, show that x · y = s1t1 + · · · + sn tn. That is, in computing x · y, one may replace the coordinates of x and y by their components relative to any orthonormal basis for n.
3.7Orthogonalize the basis (1, 0, 0, 1), (−1, 0, 2, 1), (0, 1, 2, 0), (0, 0, −1, 1) in 4.
3.8Orthogonalize the basis
in n.
3.9Find an orthogonal basis for the 3-dimensional subspace V of 4 that consists of all solutions of the equation x1 + x2 + x3 − x4 = 0. Hint: Orthogonalize the vectors v1 = (1, 0, 0, 1), v2 = (0, 1, 0, 1), v3 = (0, 0, 1, 1).
3.10Consider the two equations
Let V be the set of all solutions of (*) and W the set of all solutions of both equations. Then W is a 2-dimensional subspace of the 3-dimensional subspace V of 4 (why?).
(a) Solve (*) and (**) to find a basis v1, v2 for W.
(b) Find by inspection a vector v3 which is in V but not in W. Why is v1, v2, v3 then a basis for V?
(c) Orthogonalize v1, v2, v3 to obtain an orthogonal basis w1, w2, w3 for V, with w1 and w2 in W.
(d) Normalize w1, w2, w3 to obtain an orthonormal basis u1, u2, u3 for V. Express v = (11, 3, 6, −11) as a linear combination of u1, u2, u3.
(e) Find vectors and such that v = x + y.
3.11Show that the functions
are orthogonal in the inner product space of Example 1.
3.12Orthogonalize in the functions 1, x, x2, x3, x4 to obtain the polynomials p0(x), . . . , p4(x) listed in Example 8.