Mathematics and its applications - Views on the Meaning and Ontology of Mathematics - Mathematics, Substance and Surmise

Mathematics, Substance and Surmise: Views on the Meaning and Ontology of Mathematics (2015)

Mathematics and its applications

David Berlinski1

(1)

14 rue Chanoinesse, 75004 Paris, France

David Berlinski

Email: jeromed.berlinski729@gmail.com

Abstract

If mathematics, unlike entomology, is unreasonably effective, it should be possible to say with, at least some precision, what it means for a mathematical object, structure, or theory to be applied to an object, structure or theory that is resolutely not mathematical. If it is possible to say as much, I have not found a way in which to say it. Mathematics is about mathematics; and so far as the Great Beyond is concerned, while it is obvious that mathematics is often applied, it is anything but obvious how this is done.

1   Introduction

Group theory is about groups; the theory of rings, about rings. This suggests a generalization. Mathematics is about mathematics. Mathematics is about mathematics in the sense that entomology is about bugs. Who would deny it? If mathematics is about mathematics, it is about many other things as well. No one remarks on the unreasonable effectiveness of entomology. In counting two fingers and two fingers and reaching four fingers, numbers are being applied to fingers. In what does the application consist? The idea that there is a mapping between a subset of the natural numbers and the fingers of the human hand has all of the disadvantages of an arranged marriage. It appears reasonable only to those least involved in the proceedings. To the extent that the mapping is mathematical, it cannot have fingers in its range; and to the extent that it is not, it cannot have numbers in its domain.

Children nonetheless count their fingers with ease, and so do mathematicians.

How is it done?

In finger counting, we count fingers. This might suggest that numbers are one thing, fingers, another. But the creation of numbers, as Thierry of Chartres observed, is the creation of things. One finger is one finger necessarily. It could not be two fingers. If in counting fingers, we are mapping the number one to oneness, what remains of an application of numbers to things? One applies to oneness in just the sense in which one applies to itself. There is, after all, only one. The statement that two fingers plus two fingers make four fingers is, when divided through by fingers, simply the statement that two plus two equals four. But one cannot divide through anything by fingers; and to leave the fingers out is only to return to the observation that two plus two equals four. This is nothing to sneeze at, of course, but it is nothing about which one might wish to write home.

What is unreasonably effective in mathematics is mathematics. What is unreasonably effective beyond mathematics is, as Eugene Wigner observed, a miracle.

2

Whatever its connection to other disciplines, mathematics frequently appears to be about itself, its concepts self-applied. Take groups. A set G closed under an associative binary operation G X G → G is a group if G includes an identity and an inverse. An identity I returns every element to itself: a o I = a. An inverse returns any element to the identity: a −1 o a = I.

Groups are stable objects, important in mathematics as well as in physics. They play a role in topology, a subject devoted in large measure to the analysis of continuity. The most familiar topological space is the real line; its topology is defined by sets of open intervals. Now mathematicians, as well as philosophers, depend on the familiar upward movement of conceptual ascent, if only to rise above the smog and get a better view of things. A topological space is a particular item; this is the view from the ground. But the collection Top of topological spaces is a category; and this is the view that ascent reveals.

Categories are not simply sets. A set directs the mathematician’s eye toward its elements A and B; a category, toward the morphisms Mor(A, B) between them. Morphisms may themselves be composed

 $$ \mathbf{Mor}(B,C)\ \mathrm{X}\;\mathbf{Mor}(A,B)\to \mathrm{Mor}(A,C), $$

subject only to the triplet of conditions that:

2.1

Mor(A, B) and Mor(A', B') are either equal (A = A' and B = B') or disjoint;

2.2

There is an identity morphism id A Mor(A, A) for every A in Top; and

2.3

Morphism composition is associative:

 $$ \left(h\;o\;g\right)o\;f=h\;o\;\left(g\;o\;f\right), $$

whenever ƒ ∈ Mor(A, B), g ∈ Mor(B, C), and h ∈ Mor(C, D)—this again for every A, B, C, and D in Top.

Wherein do groups figure? They figure in algebraic topology, a subject in which algebraic objects are assigned to topological structures in such a way that topological questions may be settled by algebraic methods.

Some definitions. A path or arc (or even a curve) in a topological space X is a continuous mapping ƒ of a closed interval I = [a, b] into X. In what follows, I is always the closed interval [0, 1]. The images of a and b are the endpoints of the arc. A space is arcwise connected if any two points in X may be joined in an arc. Two paths in X are equivalent if one can be continuously transformed into the other.

Suppose that ƒ and g are paths in X such that g starts where ƒ ends. The product of ƒ and g is:

 $$ (f.g)=\begin{array}{l@{\quad}l} f(2t) & 0\le t\le 1/2 \\ g\left(2t-1\right) & 1/2\le t\le 1. \end{array} $$

The multiplication of paths is in general not associative; but associativity is recovered when paths are grouped into equivalence classes. It is easy now to define both an identity and an inverse. Assume this done.

A path is closed (or loops) if its initial and terminal points are the same. Let x, now, be any point of X. The set of all paths that loop from x to x satisfies group theoretic axioms; satisfying them, they form the fundamental group π( X, x) of X at the base point x.

A first connection between topology and algebra now emerges, like an image under darkroom developer:

2.4

If X is arcwise connected, then π(X, x) and π(X, y) are isomorphic for any two points x, y in X.

The theorem’s contrapositive is somewhat more revealing: No matter the pair of points in X, if π(X, x) and π(X, y) are not isomorphic, X not arcwise connected. A topological condition has been determined by a group theoretic property.

As in all magic acts, one good trick suggests another. Let S 1, for example, be the unit circle in the real or complex plane. And let ƒ: IS 1 be the closed path that goes around the circle just once:

 $$ f(t) = (\mathbf{cos}\;2\uppi t,\;\mathbf{sin}\;2\uppi t),\kern1em 0\le t\le 1. $$

ξ(ƒ) is the equivalence class of ƒ. The obvious theorem follows:

2.5

The fundamental group π(S 1, (1, 0)) is an infinite cyclic group generated by ξ(ƒ).

The proof of 2.5, like that of 2.4, is a matter of applying diligently the definitions; but what follows is different, altogether dramatic. Let E n be the closed unit ball in Euclidean n-space; and let ƒ be a continuous map of E n into itself. Does ƒ have a point x such that ƒ(x) = x? It is a good question. The answer is provided by Brouwer’s fixed-point theorem:

2.6

Any continuous map ƒ of E n into itself has at least one fixed point.

Unlike 2.5, this theorem is an affirmation with a thousand faces, one of the protean declarations of mathematics.

The proof for n ≤ 2 suggests the whole. First look to n > 0. A subset A of a topological space X is a retract of X if there exists a continuous map γ: XA such that γ(a) = a for every a in A. If f: E n E n has no fixed points then S n-1 is a retract of E n . Proceed by contraposition. S n-1 is not a retract of E n when n = 1 because E 1 is connected, S 0, disconnected. Go, then, to n = 2. π(S 1) is infinite cyclic; but π(E 2) is a trivial group. It follows that S 1 is not a retract of S 2.

Categories were created with the aim of highlighting the mappings or morphisms between mathematical structures. The category Top of all topological spaces has already been fixed; ditto by definition the category Grp of all groups. A functor is a morphism between categories. If A and B are categories their covariant functor F: AB assigns to each object a in A an object F(a) in B; and assigns, moreover, to each morphism ƒ: AA a morphism F(ƒ): F(A) → F(A).

The rules of the game are simple. For every a in A:

2.7

F(id A ) = id(F A ),

where id is the identity morphism, and if ƒ: AB and g: BC are two morphisms of A, then

2.8

F(g o ƒ) = F(g) o F (ƒ).

Contravariant functors reverse the action of covariant functors; a pair, consisting of a covariant and contravariant functor, make up a representation functor.

Within algebraic topology, it is often useful (and sometimes necessary) to specify a topological space with respect to a distinguished point; such spaces constitute a category Top * in their own right. A new category does nothing to change the definition of the fundamental group, of course, but it does make for a satisfying illustration of the way in which the fundamental group may acquire a secondary identity as a functor, one acting to map a category onto an algebraic object:

 $$ \uppi \left(\mathbf{X},x\right):\mathbf{Top}^{*}\to \mathbf{G}\mathbf{r}\mathbf{p}. $$

This way of looking at things indicates that the fundamental group serves not simply to mirror the properties of a given topological space, but to mirror as well the continuous maps between spaces, a circumstance not all that easy to discern from the definition itself.

These considerations were prompted by a concern with mathematics self-applied. Herewith a provisional definition, one suggested by the functorial explication of the fundamental group. If X and Y are mathematical objects, then

2.9

X applies to Y if and only if there are categories A and B, such that Y belongs to A and X to B, and there exists a functor F on A such that F(Y) = X.

The language of categories and functors provide a subtle and elegant descriptive apparatus; still, categories and functors are mathematical objects and the applications so far scouted are internal to mathematics.

What of the Great Beyond? The scheme that I have been pursuing suggests that mathematics may be applied to mathematical objects; it makes no provisions for applications elsewhere.

3

A mathematical theory with empirical content, Charles Parsons has written [16 p. 382], “takes the form of supposing that there is a system of actual objects and relations that is an instance of a structure that can be characterized mathematically” (italics mine). These are not lapidary words. They raise the right question, but by means of the wrong distinction. They misleadingly suggest, those words, that mathematical objects without empirical content are somehow not actual. Not actual? But surely not potential either? And if neither actual nor potential, in what sense would mathematical theories without empirical content be about anything at all? The word that Parsons wants is physical; and the intended contrast is the familiar one, mathematical objects or structures on the one side, and physical objects or structures on the other.1

The question Parsons raises about mathematics, he does not answer explicitly, his discussion trailing off irresolutely. W.V.O. Quine [17 p. 398] is more forthright. “Take groups,” he writes:

In the redundant style of current model theory, a group would be said to be an ordered pair (K, ƒ) of a set K and a binary function ƒ over K fulfilling the familiar group axioms. More economically, let us identify the group simply with ƒ, since K is specifiable as the range of ƒ. Each group, then, is a function ƒ fulfilling three axioms. Each group is a class ƒ of ordered triples, perhaps a set or perhaps an ultimate class. … Furthermore, ƒ need not be a pure class, for some of its ordered triples may contain individuals or impure sets. This happens when the group axioms are said to be applied somewhere in natural science. Such application consists in specifying some particular function ƒ, in natural science, that fulfills the group axioms and consists of ordered triples of bodies or other objects of natural science.

Whatever else they may affirm, these elegant remarks convey the impression that mathematical concepts (or predicates) are polyvalent in applying indifferently to mathematical and physical objects. “[G]rouphood,” Quine writes (on the same page), “is a mathematical property of various mathematical and non-mathematical functions.” This is rather like saying that cowhood is a zoological property of various zoological and non-zoological herbivores. If there are no non-zoological cows, why assume that there are some non-mathematical groups?

Skepticism about the application of mathematics arises as the result of the suspicion that nothing short of a mathematical object will ever satisfy a mathematical predicate. It is a suspicion that admits of elaboration in terms of an old-fashioned argument.2 Let me bring the logician’s formal language L into the discussion; ditto his set K of mathematical structures. A structure’s theory T(K) is the set of sentences φ of L such that φ holds in every model M K.Let K constitute the finite groups and T(K) the set of sentences true in each and every finite group. Sentences in T(K), the logician might say, are distributed to the finite groups.

Nothing esoteric is at issue in this definitional circle. Distribution is the pattern when an ordinary predicate takes a divided reference. The logician’s art is not required in order to discern that whatever is true of elephants in general is necessarily true of Bruno here. It is a pattern that fractures in obvious geometric cases. Thus consider shape, one of the crucial concepts by which we organize sensuous experience, and the subject, at least in part, of classical Euclidean geometry.3 Is it possible to distribute the truths of Euclidean geometry to the shapes of ordinary life—desktops, basketballs, football fields, computer consoles, mirrors, and the like?

Not obviously.

In many cases, the predicates of Euclidean geometry just miss covering the objects, surfaces, and shapes that are familiar features of experience. Euclidean rectangles are, for example, bounded by the line segments joining their vertices. Rectangles in the real world may well be finite but unbounded, with no recognizable sides at all because beveled at their edges.4 The chalk marks indicating the length and width of a football field have a determinate thickness and so contain multiple boundaries if they contain any boundary at all. Euclidean rectangles are structurally unstable: small deformations destroy their geometrical character. Not so physical rectangles. Such regions of space are robust. They remains rectangular and not some other shape despite an assortment of nicks, chips, and assorted scratches. The sum of the interior angles of a Euclidean rectangle is precisely 360 degrees; the interior angles on my desk sum to more or less 360 degrees.

More or less.

These particular cases may be enveloped by a general argument. It is a theorem that up to isomorphism there exist only two continuous metric geometries. The first is Euclidean, the second, hyperbolic. The categorical model for Euclidean geometry is the field of real numbers. It follows thus that if Euclidean geometry is distributed, physical space must locally be isomorphic to ℝ n .5 It is difficult to understand how the axioms of continuity could hold for physical points;6 difficult again to imagine a one-to-one correspondence between physical points and the real numbers.7 How would a correspondence between the real numbers and a variety of physical points be established?

Experimentally?

4

If distribution lapses in the case of crucial mathematical and physical shapes, it is often not by much, a circumstance that should provoke more by way of puzzlement than it does. The sum of the interior angles of a Euclidean triangle is precisely one hundred and eighty degrees—π radians, to switch to the notation of the calculus, and then simply π, to keep the discussion focused on numbers and not units.8 This is a theorem of Euclidean geometry, a fact revealed by pure thought. Yet mensuration in the real world reveals much the same thing among shapes vaguely triangular: the sum of their interior angles appears to follow a regular distribution around the number π. The better the measurement, the closer to π the result. This would seem to suggest a way forward in the project of making sense of a mathematical application. Letting M(Δ) and P(Δ) variably denote mathematical and physical triangles, and letting niceties of notation for the moment drift, 4.1 follows as a provisional definition, one that casts a mathematical application as the inverse of an approximation.

4.1

M(Δ) applies to P(Δ) if and only if P(Δ) is an approximation of M(Δ).

Now approximation is a large, a general concept, and one that appears throughout the sciences.9 It is a concept that has a precise mathematical echo. Let E be a subset of the line. A point ξ is a limit point of E if every neighborhood of ξ contains a point q ≠ ξ such that q ∈ E. The definition immediately makes accessible a connection between the existence of a limit point and the presence of convergent sequences, thus tying together a number and a process:

4.2

If E ⊆ ℝ then ξ is a limit point of E if and only if there exists a set of distinct points S = {x n } in E itself such that  $$ \underset{n\to \infty }{ \lim}\left\{{x}_n\right\}=\upxi $$ .

Approximation as an activity suggests something getting close and then closer to something else, as when a police artist by a series of adroit adjustments refines a sketch of the perpetrator, each stroke bringing the finished picture closer to some remembered standard. 4.2 reproduces within point-set topology the connection between some fixed something and the numbers that are tending toward it, S and ξ acting as approximation and approximatee. The reproduction improves on the general concept inasmuch as convergence brings within the orbit of approximation oscillating processes—those governed by familiar functions such as ƒ(x) = x sin 1/x. So long as the discussion remains entirely within the charmed circle of purely and distinctively mathematical concepts, what results is both clear and useful. The Weierstrass approximation theorem serves as an illustration:

4.3

If ƒ is a complex valued function, one continuous on [a, b], there exists a sequence of polynomials P n such that  $$ \underset{n\to \infty }{ \lim }{P}_n(x)=f(x) $$ uniformly on [a, b].

The proof is easy enough, considering the power and weight of the theorem. It is surprising that any complex and continuous function may be approximated by a sequence of polynomial functions on a closed and bounded interval. More to the point, 4.3 gives to approximation a precise, independent and intuitively satisfying interpretation.

Difficulties arise when this scheme is off-loaded from its purely mathematical setting. Mensuration, I have said, yields a set of numbers, but beyond any of the specific numbers, there is the larger space of possible points in which they are embedded—Ω, say. Specific measurements comprise a set of points S* within Ω. Relativized to the case at hand, the requisite relationship of approximation would seem now to follow:

4.4

P(Δ) is an approximation of M(Δ) if and only if π is a limit point of Ω. [15 p. 122]

4.4 is assumed even in the most elementary applications of the calculus. Talking of velocity, the authors of one text assert that “[j]ust as we approximate the slope of the tangent line by calculating the slope of the secant line, we can approximate the velocity [of a moving object] at t = 1 by calculating the average velocity over a small interval [1, 1 + Δt] …” Approximate? Approximate how? By allowing average speeds to pass to the limit, of course, the answer of analytic mechanics since the seventeenth century.

But the usefulness of 4.4 is entirely cautionary. 4.5 follows from 4.4 and 4.2:

4.5

P(Δ) is an approximation of M(Δ) if and only if there exists a set of distinct points S in Ω itself such that  $$ \underset{n\to \infty }{ \lim}\left\{{x}_n\right\}=\uppi $$ .

And yet 4.5 is plainly gibberish. The real world makes available only finitely many measurements and these expressed as rational or computable real numbers. There exists no set of distinct points in S converging to π or to anything else. Ω is a subset of the rational numbers and the definitions of point-set topology are unavailing. Taken literally, 4.5 if true implies that P(Δ)is not—it is never—an approximation of M(Δ), however, close points in S* may actually come to π. 4.5 must be taken loosely, then, but if S and S* are distinct—and how else to construe the requisite looseness?—4.1 lapses into irrelevance.

5

Symmetry is a property with many incarnations, and so a question arises as to the relationship between its mathematical and physical instances. It is group theory, Hermann Weyl [22] affirms, that provides a language adequate to its definition. Weyl’s little book contains many examples and represents a significant attempt to demonstrate that certain algebraic objects have a direct, a natural, application to ordinary objects and their properties.

Let Γ be the set of all points of some figure in the plane. A permutation on Γ is a bijection Γ → Γ; a given permutation ƒ is a symmetry of Γ or an automorphism when ƒ acts to preserve distances. The set of all symmetries on Γ under functional composition (one function mounting another) constitutes the group of symmetries G(Γ) of Γ.

Let Γ, for example, be the set of points on the perimeter of an equilateral triangle. Three sides make for three symmetries by counterclockwise rotation through 120, 240, and 360 degrees. These symmetries may be denoted as R, R o R, and R o R o R, which yields the identity and returns things to their starting position. There are, in addition, three symmetries D 1, D 2, and D 3 that arise by reflecting altitudes through the three vertices of the triangle. The transformations

 $$ {\boldsymbol{\Delta}}_3 = \left\{R,{R}_2,{R}_3,{D}_1,{D}_2,{D}_3\right\} $$

describe all possible permutations of the vertices of the given triangle. These being determined, so are, too, the relevant automorphisms.

So, too, the symmetric group Δ 3.

A set S of symmetrically related objects is fashioned when a sequence of automorphisms is specified, as in 5.1:

5.1

 $$ \begin{array}{clc} {A}_1 & {A}_2 & {A}_k\; \\ \Gamma \to \Gamma& \to \Gamma \dots & \to \dots \Gamma . \end{array} $$

The objects thus generated form a symmetrical sequence S. This suggests the obvious definition, the one in fact favored by Weyl:

5.2

G(Γ) applies to S if and only if S is symmetrical on Γ.

So far, let me observe skeptically, there has been no escape from a circle of mathematical objects. Whatever the invigoration group theory affords, the satisfaction is entirely a matter of internal combustion. G(Γ) is plainly a mathematical object; but in view of 5.1, so, too, is S.

Nonetheless, an extended sense of application might be contrived—by an appeal to the transitivity of application, if nothing else—if sequences such as S themselves apply to sequences of real objects; applying directly to S, Γ would then apply indirectly to whatever S is itself applied. Thus

5.3

G(Γ) applies to S* if and only if S applies to S* and G(Γ) applies to S.

It is to S* that one must look for physical applications.

And it is at this point that any definitional crispness that 5.3 affords begins to sag. The examples to which Weyl appeals are artistic rather than physical; but his case stands or falls on the plausibility of his analysis and owes little to the choice of examples. Symmetries occur in the graphic arts most obviously when a figure, or motif, occurs again and again, either in the plane or in a more complicated space. They exist, those figures or inscriptions or palmettes—the last, Weyl’s example—in space, each separate from the other, each vibrant and alive, or not, depending on the artist's skill. But in looking at a symmetrical sequence of this sort, 5.1 gives entirely the wrong impression. The problem is again one of distribution and confinement. 5.1 represents a symmetrical sequence generated by k operations on a single abstract object—Γ, as it happens. Those Persian bowmen or Greek palmettes or temple inscriptions are not generated by operations on a single figure. They are not generated at all. Each of n items is created independently and each is distinct.10 And none is quite identical to any other.

A far more natural representation of their relationship is afforded by mappings between spaces as in:

5.4

 $$ \begin{array}{c} \kern-1.2em f\kern1.44em g\kern2.9em h\kern0.6em \\ \mathrm{X}\to \mathrm{Y}\to \mathrm{Z}\dots \to \dots \mathrm{W}. \end{array} $$

If a connection to geometry is required, X, Y, Z, and W may be imagined as point sets, similar each to Γ: ƒ, g, and h take one space to the next. Functional composition extends the range of the mappings: ƒ o h = j: X W. The sense in which 5.4 represents a symmetrical sequence may be expressed quite without group theory at all. Thus let the various functions be bijections; let them, too, preserve distances, so that if x, y ∈ X

 $$ \boldsymbol{D}\left(x,y\right) = \boldsymbol{D}\left(f(x),f(y)\right). $$

Each function then expresses an isomorphism. Congruence comes to be the commanding concept, one indicating that adjacent figures share a precisely defined similarity in structure.

But if 5.1 informs 5.4, so that the sense of symmetry exhibited at 5.4 appears as group theoretical, it is necessary plainly that the following diagram must commute:

5.5

A329959_1_En_6_Figa_HTML.gif

when Γ = X.

And obvious, just as plainly, that it never does in virtue of the fact that X ≠ Y ≠ Z W.

6

If groups do not quite capture a suitable sense of symmetry, it is possible that weaker mathematical structures might. A semigroup is a set of objects on which an associative binary operation has been defined. No inverse exists; no identity element is required. The semigroups have considerably less structure than the groups.11 Functional composition is itself an associative operation. Say now that S[X, Y, …, Z] is any finite sequence of the point sets (or figures) X, Y, … , Z, and let C be a collection of isomorphic mappings over S. 5.1, 5.2, and 5.3 have their ascending analogs in 6.1, 6.2, and 6.3:

6.1

Isomorphisms over C form a semigroup SG under composition;

6.2

A sequence S[X, Y, …, Z] is symmetrical in X, Y, …, Z if X Y for every pair of elements X, Y in S[X, Y, …, Z]; and

6.3

SG applies to S if and only if S* is symmetrical in X, Y, …, Z.

Transitivity of application is again invoked to fashion a notion of indirect application. The application of SG at 6.3 makes for a weak form of algebraic animation; but it does little to dilute the overall discomfort prompting this discussion. Let me reconvey my argument. 6.1 is an abstract entity, a sequence of point sets or spaces. The symmetries seized upon by the senses obtain between palpable and concrete physical objects. Symmetries thus discerned are approximate; the discerning eye does what it does within a certain margin of error. To the extent that a refined judgment of symmetry hinges on a definition of congruence, the distances invoked by the definition are preserved only to a greater or lesser extent. Thus if x and y are points in X, and ƒ: X → Y, then

 $$ \boldsymbol{D}\left(x,y\right) = \boldsymbol{D}\left(f(x),f(y)\right)\pm \updelta . $$

At 6.1, distances are preserved precisely.12 If we had a convincing analysis of approximation, an analysis of applicability might well follow. One rather suspects that to pin ones hopes on approximation is in this case a maneuver destined simply to displace the fog backward.

Sections 4–6 were intended provisionally to answer the question whether a group takes physical instances. Asked provisionally, that question now gets a provisional answer:

No.

7

The arguments given suggest that nothing short of a mathematical object is apt to satisfy a mathematical predicate; it is hardly surprising, then, that within physics, at least, nothing short of a mathematical object does satisfy a mathematical predicate.

The systems theorist John Casti [9 pp. 22–25] has argued that mathematical modeling is essentially a relationship between a natural system N and a formal system F. Passage between the two is by means of an encoding map ζ: NF, which serves to associate observables in N with items in F. The idea of an encoding map is not without its uses. The encoding map, if it exists at all, conveys a natural into a mathematical world: ζ: NM. For my purposes, the map’s interest lies less with its ability to convey natural into mathematical objects, but in the reverse. If an encoding map exists, its inverse ζ*: M → N should serve to demarcate at least one sense in which mathematical objects receive an application.

The argument now turns on choices for ζ*. Within particle physics (but in other areas as well), M is taken as a group, N as its representative, and ζ* understood as the action of a group homomorphism. Such is the broad outline of group representation theory. Does this scheme provide a satisfactory sense of application? Doubts arise for the simplest of reasons. If ζ* does represent the action of a group homomorphism, surely N is for that reason a mathematical object? If this is so, the scheme under consideration has in a certain sense overshot the mark, the application map, if it is given content, establishing that every target in its range is a mathematical object.

Consider a single particle—an electron, say—on a one-dimensional lattice; the lattice spacing is b [20]. The dynamics of this system are governed by the Hamiltonian

7.1

H = p 2/m + V(x),

where m measures the mass of the electron and p its momentum. V is the potential function and satisfies the condition that:

7.2

V(x + nb) = V(x)

for every integer n. The system that results is symmetrical in the sense that translations xx' = x + nb leave H unchanged. Insofar as they are governed by a Hamiltonian, any two systems thus related behave in the same way.

A few reminders. Within quantum theory, information is carried by state vectors. These are objects that provide an instantaneous perspective on a system, a kind of snapshot. Let Q be the set of such vectors, and let /y > and /y’ > be state vectors related by a translation. The correspondence /|y > → /|y' > may itself be expressed by a linear operator T in Q:

7.3

/y > → /|y' > = T/y>,

— this for every state vector /y > .

Not any linear operator suffices, of course; physical observables in quantum theory are expressed as scalar products < F/y > of the various state vectors. It is here that old-fashioned numbers make an appearance and so preserve a connection between quantum theory and a world of fact and measurement. Unitary linear operators preserve scalar products; they preserve, as well, the length and angle between vectors. To the extent that 7.3 takes physically significant vectors to physically significant vectors, those operators must be unitary; so too the target of T.

Let us step backward for a moment. Here is the Big Picture. On the left there is a symmetrical something; on the right, another something. Symmetrical somethings of any sort suggest group theory, and, in fact, the set of symmetry operations on a lattice may be described by the discrete translation group G(T D ). Those other somethings comprise the set {T} of unitary linear operators. {T} constitutes a representation of the symmetry operations on H; it resembles an inner voice in harmony.

Now for a Close Up. First, there is the induction of group theoretic structure on the alien territory of a set L of linear operators in a vector space Q. L becomes a group G(L) under the definition of the product of two operators Aand B in L:

7.4

Cx = A(B(x)).

The identity is the unit operator. And every operator is presumed to have an inverse.

Next an arbitrary group G makes an appearance. The homomorphism

7.5

h: GG(L)

acts to establish a representation of G in G(L), with G(L) its representative. In general, group theory in physics proceeds by means of the group representation.13 In the example given, G(T D ) corresponds to G; {T} to L; and given an h such that

7.6

h: G(T D ) → G{T},

G{T} corresponds to G(L).

7.5 has an ancillary purpose: It serves to specify the applications of group theory to physics in a large and general way. 7.6 makes the specification yet more specific. The results are philosophically discouraging (although not surprising). G and G(L) are mathematical objects; but then, so are G(T D ) and G{T}. The real (or natural) world intrudes into this system only via the scalars. If there is any point at which mathematics is applied directly to anything at all, it is here. But these applications involve only counting and measurement. This is by no means trivial, but it does suggest that the encoding map carries information of a severely limited kind. An application of group theory to physics, on the evidence of 7.5 and 7.6, is not yet an application of group theory to anything physical: so far as group representation goes, the target of every mathematical relation is itself mathematical; and as far as quantum theory goes, those objects marked as physical by the theory—the range of Casti's encoding map—do not appear as the targets of any sophisticated mathematical application.

This conclusion admits of ratification in the most familiar of physical applications. Consider the continuous rotation group S0(2). The generator J of this group, it is easy to demonstrate, is R(ϕ) = e -iϕJ , where R(ϕ) is, of course, a continuous measure of rotation through a given angle. S0(2) is a Lie group; its structure is determined by group operations on J near the identity. So too its representations. Thus consider a representation of S0(2) defined in a finite dimensional vector space V. R(ϕ) and J both have associated operators R(ϕ)* and J* in V. Under certain circumstances J* may be understood as an angular momentum operator in the sense of quantum mechanics. This lends to J a certain physical palpability. Nonetheless, J* is and remains an operator in V, purely a mathematical object in purely a mathematical space. The same point may be made about the actions of SU(2) and SU(3), groups that play a crucial role in particle physics. SU(2), for example, is represented in a two-dimensional abstract isospin space. The neutron and the proton are regarded as the isospin up and down components of single nucleon. SU(2) defines the invariance of the strong interaction to rotations in this space. But SU(2) is a mathematical object; so, too, its representative. Wherever the escape from a circle of mathematical concepts is made, it is surely not here.

8

It is within mathematical physics that mathematics is most obviously applied and applied moreover with spectacular success, the very terms of description—mathematical physics—suggesting one discipline piggy-backed upon another. Still, to say that within quantum field theory or cosmology, mathematics has been a great success is hardly to pass beyond the obvious. A success in virtue of what? The temptation is strong to affirm that successes in mathematical physics arise owing to the application of mathematical to physical objects or structures, but plainly this is to begin an unattractive circle. It was this circumstance, no doubt, that prompted Eugene Wigner to remark that the successes of mathematical physics were an example of the ‘unreasonable effectiveness of mathematics.’

The canonical instruments of description within mathematical physics are ordinary or partial differential equations. In a well-known passage, Hilbert and Courant asked under what conditions “[a] mathematical problem … corresponds to physical reality.” By a problem they meant an equation or system of equations. Their answer was that a system of differential equations corresponds to the physical world if unique solutions to the equations exist, and that, moreover, those solutions depend continuously on variations in their initial conditions, [10 p. 227].14 Existence and uniqueness are self-evident requirements; the demand that solutions vary continuously with variations in their initial conditions is a concession to the vagaries of measurement:

The third requirement … is necessary if the mathematical formulation is to describe observable natural phenomena. Data in nature cannot possibly be conceived as rigidly fixed; the mere process of measuring them involves small errors. … Therefore, a mathematical problem cannot be considered as realistically corresponding to a physical phenomena unless a variation of the given data in a sufficiently small range leads to an arbitrary small change in the solution.

Following Hadamard, Hilbert and Courant, call a system of equations satisfying these three constraints well-posed.

Well-posed problems in analysis answer to a precise set of mathematical conditions. Consider a system of ordinary first-order differential equations expressed in vector matrix form:

8.1

dx/dt = ƒ(x, t), x(a) = b.

Existence, uniqueness and continuity depend on constraints imposed on ƒ(x , t). Assume thus that R is a region in < x , t >. f(x, t) is Lipschitz continuous in R just in case there exists a constant k > 0 such that:

 $$ f\left({x}_1,t_{1}\right)\ \hbox{--}\ f\left({x}_2,t_{2}\right)\le k\left|{x}_1\ \hbox{--}\ {x}_2\right|. $$

Here (x 1, t) and (x 2, t) are points in R.

Assume, further, that ƒ is continuous and Lipschitz continuous in R; and let δ be a number such that 0 < δ < 1/k. And assume, finally, that u and v are solutions of 8.1. Uniqueness now follows, but only for a sufficiently small interval (the interval, in fact, determined by δ ):

8.2

If u and v are defined on the interval |tt 0| ≤ δ, and if u(t 0) = v(t 0), then u = v.

What of existence? Let u 1, u 2, …, u n be successive approximations to 8.1 in the sense that

 $$ {u}_0(t) = {x}_0 $$

 $$ {u}_{k+1}(t) = {x}_0+{\displaystyle {\int}_{\!\!t_0}^t f\big(x,{u}_k(t)\big)dt,}\kern1em k=0,1,2,\dots $$

Suppose now that ƒ is continuous in R: |xx 0| ≤ a, |tt 0| ≤ b, (a, b) > 0; and suppose, too, that M is a constant such that ƒ(t, x) < M for all (t, x) in R. Let I be the interval |xx 0| ≤ h, where h is the minimum of {a, b/M}. The Cauchy–Peano theorem affirms that:

8.3

The approximations u 1, …, u n converge on I to a solution u of 8.1.

8.3 is purely a local theorem: it says that solutions exist near a given point; it says nothing whatsoever about wherever else they may exist. The theorem is carefully hedged. And for good reason. There are simple differential equations for which an embarrassing number of solutions exist, or none at all. The equation y 2 + x 2 y’ = 0 is an example. Infinitely many solutions satisfy the initial condition y(0) = 0. No solution satisfies the initial condition y(0) = b, if b ≠ 0. At <0, 1>, this equation fails of continuity.

The Cauchy–Peano theorem does not apply.

8.3 may be supplanted by a global existence theorem, but only if ƒ is Lipschitz continuous for every t in an interval I. It follows then that successive solutions are defined over I itself.

There remains the matter of continuity. Let u be a solution of 8.1 passing through the point (t 0, x 0); and let u* be a solution passing through (t 0', x 0'). Both u and u* pass through those points in R. The requisite conclusion follows, preceded by a double condition:

8.4

If for every ∈ > 0, there is a δ > 0 such that u and u* exist on a common interval a < t < b; and if a < t' < b, |tt'| < δ, |t 0t 0'| < δ, |x 0x 0'| < δ, then |u(t) – u(t)*| < ∈.

To the extent that 8.1 satisfies the hypotheses of 8.2, 8.3, and 8.4, to that extent 8.1 is well-posed.

The concept of a well-posed problem in analysis is interesting insofar as it specifies conditions that are necessary for applicability; but however necessary, they are, those conditions, hardly sufficient. How could they be? Like any other equation, a differential equation expresses an affirmation: some unknown function answers to certain conditions. The Cauchy–Peano theorem establishes that for a certain class Φ of differential equations, a suitable function exists. The elements of the theory T(Φ) satisfied in models of T(Φ) are true simply in virtue of being elements of T(Φ).

But true of what? Surely not physical objects? This would provide an access to the real world too easy to be of any use.

Like so many other mathematical objects, a differential equation is dedicated to the double life. If the first is a matter of the solutions specified by the equation, the second involves the induction of form over space. The simple differential equation:

8.5

 $$ {df}/{dt}=Af(t) $$

provides a familiar example. The association established between t and ƒ(t) creates a coordinate system. The set of points evoked—t, on the one hand, ƒ(t), on the other—is the phase (or state) space of the equation. Axevidently plays A against each of its phase points. This implies that Ax represents the rate of change of x at t. Rates of change evoke tangent lines and slopes. To each point in the < t, ƒ(t) > plane, a differential equation—the differential equation—assigns a short line segment of fixed slope. A phase space upon which such lines have been impressed is a direction or lineal field.

Imagine now that the plane has been filled with curves tangent at each point to the lines determining a lineal field. The set of such curves fills out the plane. And each curve defines a differential solution u(t) = x to a differential equation, for plainly du(t)/dt = Au(t) in virtue of the way in which those curves have been defined.

It is thus that a differential equation elegantly enters the geometrical scene. Nothing much changes in the interpretation of 8.1 itself. The lineal field passes to a vector field, but the induction of geometrical structure on an underlying space proceeds apace. The system of equations

8.6

 $$ \begin{array}{l}{dX}/{dt}=y\\ {dY}/{dt}=-x\end{array} $$

assigns to every point in the <X, Y> plane a vector <y, −x>. The trajectory or flow of a differential equation corresponds to the graph of those points in the plane whose velocity at <x, y> is simply <y, −x>. Trajectories have a strange and dreamy mathematical life of their own. The Poincaré–Bendixson theorem establishes, for example, that a bounded semi-trajectory tends either to a singular point or spirals onto a simple closed curve.

An autonomous system of n differential equations is one in which time has dwindled and disappeared. Points in space are n-dimensional, with R n itself the collection of such points. On R n things are seen everywhere as Euclid saw them: R n is an n-dimensional Euclidean vector space. On a differential manifold, things are seen locally as Euclid saw them; globally, functions and mappings may be overtaken by weirdness. Thus the modern definition of a dynamical system as a pair <M, V> consisting of a differential manifold M and its associated vector field V. To every differential equation there corresponds a dynamical system.

On the other hand, suppose that one starts at a point x of M. Let g t (x) denote the state of the system at t. For every t, g defines a mapping g t : MM of M onto itself. Suppose that g t+s  = g t g s . Suppose, too, that g 0 is an identity element. That family of mappings emerges as a one parameter group of transformations. A set M together with an appropriate group of transformations now acquires a substantial identity as a flow or dynamical system. In the obvious case, M is a differential manifold, g a differential mapping. Every dynamical system, so defined, corresponds to a differential equation. Since g t is differentiable, there is—there must be—a function v such that dg t /dt = v(g t ). A differential equation may thus be understood as the action of a one parameter group of diffeomorphisms on a differential manifold.

The attempt to assess the applicability of differential equations by looking toward their geometric interpretation runs into a familiar difficulty. Neither direction fields nor manifolds are anything other than mathematical structures. And the association between groups and differential equations suggests that insofar as the applicability of differential equations must be defended in terms of the applicability of groups, the result is apt to be nugatory. There is yet no clear and compelling sense in which groups are applicable to anything at all.

This may well suggest that the applicability of differential equations turns not on the equations, but on their solutions instead. An icy but invigorating jet now follows. The majority of differential equations cannot be solved by analytic methods. Nor by any other methods. This is a conclusion that physicists will find reassuring. Most differential equations cannot be solved; and they cannot solve most differential equations.

9

The exception to this depressing diagnosis arises in the case of linear differential equations; such systems, V. I. Arnold remarks [2 p. 97], constitute “the only large class of differential equations for which there exists a definitive theory.”

These are encouraging words.

Linear algebra is the province of linear mappings, relations, combinations, and spaces. Instead of numbers, linear algebra trades often in vectors, curiously rootless objects, occurring in physics as arrows taking off from the origins of a coordinate system, and in the calculus as directed line segments. In fact, vectors are polyvalent, described now in one formulation, now in another, the various definitions all equivalent in the end.

If x 1, x 2, … , x n R n , with {c i } a set of scalars, dying to be attached, the vector c 1 x 1 + … + c k x k is a linear combination of the vectors x 1, x 2, … , x k —linear because only the operations of scalar multiplication and vector addition are involved, a combination because something is being heaped together, as in a Caesar salad. A set of vectors { x 1, x 2, … , x k } is independent if c 1 x 1 + … + c k x k  = 0 implies that c 1 = c 2 = … = c k  = 0; otherwise, dependent, the language itself suggesting that among the dependent vectors, some are superfluous and may be expressed as a linear combination of those that remain.

This makes for an important theorem, one serving to endow vectors with an exoskeleton. Suppose that S ⊂ R n and let E be the set of all linear combinations of vectors in S. Then S is said to be spanned by E. This is a definition. It follows that E is the intersection of all subspaces in R n containing S. Theorem and definition may, of course, be reversed. A basis B of a vector space X is an independent subset of X spanning X. This, too, is a definition.

Herewith that important theorem:

9.1

If B is a basis of X, then every x in X may be uniquely expressed as a linear combination of base vectors.

The representation, note, exists in virtue of the fact that B spans X. It is unique in virtue of the fact that B is independent. Thus in R 2, every vector x = <x 1, x 2> exists first in its own right, and second as a linear combination of basis vectors x = x 1e1 + x 2e2. The unit vectors e1 = <1, 0> and e2 = <0, 1> constitute the standard basis for R 2.

The theory in which Arnold has expressed his confidence constitutes a meditation on two differential equations. The first is an inhomogeneous second-order linear differential equation

9.2

d 2 x/dx + a 1(t) dx/dt + a 2(t)x = b(t);

the second, a reduced version of the first

9.3

d 2 x/dx + a 1(t)dx/dt + a 2(t)x = 0.

It is the play between these equations that induces order throughout their solution space. Let L(x) denote x  + a 1(t)x  + a 2(t)x so that 9.2 is L(x) = b(t). That order is in evidence in the following theorems; and these comprise the theory to which Arnold refers.

For any twice differential functions u k and constants c k

9.4

L(c 1 u 1(t)+c 2 u 2(t)+…+c m u m (t)) = c 1 L(u 1(t) + c 2 L(u 2(t) + … + c m L(u m (t).

9.4 follows directly from the fact that 9.2 is linear.

9.5

If u and w are solutions to 9.2, then u – w is a solution to 9.3.

L(u(t) = b(t) = L(w(t)). But then L[u(t) – w(t)] = L(u(t)) – L(w(t)) = b(t) – b(t) = 0.

9.6

If u is a solution of 9.2, and w a solution of 9.3, then u + w is a solution of 9.3.

Say that L(u(t)) = b(t); say too that L(w(t)) = 0. L{u(t) + w(t)} = L(u(t))+L(w(t)) = b(t) + 0 = b(t).

Now let u be any solution of 9.3:

9.7

Every solution of 9.3 is of the form u + w, where w is a solution of 9.4.

Let v be a solution to 9.3. By 9.5 it follows that vu = w, where w is a solution of 9.4.

The general solution of 9.3 may thus be determined by first fixing a single solution v to 9.3, and then allowing w to range over all solutions to 9.4, an interesting example in mathematics of a tail wagging a dog.

Suppose now that u is a solution of 9.4:

9.8

If u(t0) = 0 for some t 0 and u'(t 0) as well, then u(t) = 0.

But u(t) = 0 is already a solution of 9.4.

9.9

If u 1, …, u m , are solutions of 9.4, then so is any linear combination of u 1, …, u m .

This follows at once from 9.7.

9.10

If u 1 and u 2 are linearly independent solutions of 9.4 then every solution is a linear combination of u 1 and u 2.

Note that 9.9 affirms only that linear combination of solutions to 9.4 are solutions to 9.4; but 9.10, that all solutions of 9.4 are linear combinations of two linearly independent solutions to 9.4.

Let u be any solution of 9.4. The system E of simultaneous equations

 $$ \begin{array}{l}{u}_1(0){x}_1 + {u}_2(0){x}_2 = u(0) \\ {u}_1\hbox{'}(0){x}_1 + {u}_2\hbox{'}(0){x}_2 = u\hbox{'}(0) \end{array} $$

has a non-vanishing determinant. E thus has a unique solution x 1 = c 1 and x 2 = c 2. If v(t) is c 1 u 1(t) + c 2 u 2(t) it follows that v(t) is a solution of 9.4 because it is a linear combination of solutions to 9.4. But v(0) = u(0) and v'(0) = u'(0); thus v = u by the existence and uniqueness theorem for 9.2.

It is 9.10 that provides the algebraic shell for the theory of linear differential equations, all solutions of 9.4 emerging as linear combinations of u 1 and u 2. u 1 and u 2, and thus forming a basis for S, which is now revealed to have the structure of a finite dimensional vector space. This in turn implies the correlative conclusion that solutions of 9.3 are all of the form w + c 1 u 1(t) + c 2 u 2(t), where u 1 and u 2 are linearly independent solutions of 9.4.

A retrospective is now in order. The foregoing was prompted by the desire to see or sense the spot at which a differential equation or system of equations applies to anything beyond a mathematical structure. Well-posed differential equations are useful in the sense that if their solutions did not exist or were not unique, they would not be useful at all. Continuity is less obvious a condition, whatever the justification offered by Hilbert and Courant. Still, there is nothing in the idea of a well-posed problem in analysis, or a well-posed system of equations, that does more than indicate what physically relevant systems must have. What they do have, and how they have it, this remains unstated, unexamined, and unexplored. The qualitative theory of ordinary differential equations has the welcome effect of turning the mathematician’s attention from their solutions taken one at a time to all of them at once. The imagination is thus enlarged by a new prospect, but the rich and intriguing geometrical structures so revealed does little, and, in fact, it does nothing, to explain the coordination between equations and the facts to which they are so often applied. So far as the linear differential equations go, V.I. Arnold is correct. There is a theory, and a theory, moreover, that has a stabilizing effect across the complete range of linear differential equations. This is no little thing. But while the theory draw a connection between linear equations and linear algebra, so far as their applications go, the connection is internal to mathematics, falling well within the categorical definitions of section 2.

The spot at which a differential equation or system of equations applies to anything beyond a mathematical structure?

I’m just asking.

10

Consider a physical system P. A continuously varying physical parameter ξ is given, one subject intermittently to measurement, so that g(t) = ξ is a record of how much, or how little, there is of ξ at t. Or in the case of bacteria, how many of them there are. If the pair <P, ξ> makes for a physical system, there is by analogy the pair <D, f>, where D is a differential equation, and f its solution.

Let us suppose that for some finite spectrum of values, ƒ(t k ) ≅ g(t k ).

The example that follows is the very stuff of textbooks. Having made an appearance at 8.7, the equation

10.1

 $$ {df}/{dt}=Af(t) $$

is simple enough to suggest that its applications must be transparent if any applications are transparent. Solutions are exponential: x = ƒ(t) = Ke At . The pair f>, where D just is 10.1 and f its solution, makes for a differential model.

Can we not say over an obvious range of cases, such as birth rates or the growth of compound interest, that there is a very accessible sense of applicability to be had in the play between differential equations and the physical processes to which they apply? Let me just resolve both g(t) = ξ and ƒ(t) = Ke At to t = 0, so that for the mathematician, 10.1 appears as an initial value problem, and for the biologist or the bacteria, as the beginning of the experiment. The voice of common sense now chips in to claim that for t = 0, and for some finite spectrum of values thereafter,

10.2

<D, f> applies to <P, ξ> if and only if ƒ(t k ) ≅ g(t k ).

I do not see how 10.2 could be faulted if only because the relationship in which f> applies to is loose enough to encompass indefinitely many variants. One might as well say that the two structures are coordinated, connected, or otherwise companionably joined. If one might as well say any of that, one might as well say that differential equations are very often useful and leave matters without saying why.

But 10.2 raises the same reservations about the applicability of mathematics as considerably more complicated cases. It is a one man multitude. If the inner structure of <D, f> and <P, ξ> were better aligned, one could replace an unclear sense of applicability by a mathematical mapping or morphism between them. Far from being well-aligned, these objects are not aligned at all. The function g(t) = ξ is neither differentiable nor continuous; barely literate, in what respect does it have anything to do with <D, f>? The function ƒ, on the other hand, is differentiable and thus continuous. Continuous functions take intermediate values; in what sense does it have anything to do with <P, ξ>? There is no warm point of connectivity between them. Differential and physical structures are radically unalike.

In that case, why should ƒ(t k ) ≅ g(t k )?

We are by now traveling in all the old familiar circles.

11

“To specify a physical theory,” Michael Atiyah writes in the course of a discussion of quantum field theory, “the usual procedure is to define a Lagrangian or action L.” A Lagrangian L(φ) having been given, where φ is a scalar field, the partition function P of the theory is described by a Feynmann functional integral. “These Feynmann integrals,” Atiyah writes with some understatement, “are not very well defined mathematically … .” [3 p. 3].

The parts of the theory that are mathematically well defined are described by the axioms for topological quantum field theory. A topological QFT in dimension d is identified with a functor Z such that Z assigns i) a finite dimensional complex vector space Z(Σ) to every compact oriented smooth d-dimensional manifold Σ; and ii) a vector Z(Y) ∈ Z(Σ) to each compact oriented (d + 1) dimensional manifold Y whose boundary is Σ. The action of Zsatisfies involutory, multiplicative, and associative axioms. In addition, Z(∅) = C for the empty d-manifold.

“The physical interpretation of [these] axioms,” Atiyah goes on to write, is this: … “[F]or a closed (d +1) manifold Y, the invariant Z(Y) is the partition function given by some Feynmann integral … .” It is clear that Z(Y) is an invariant assigning a complex number to any closed (d + 1) dimensional manifold Y (in virtue of the fact that the boundary is empty) and clear thus that Z and P coincide.

This definition has two virtues. It draws a relatively clear distinction between parts of a complex theory; and it provides for an interpretation of the mathematical applications along the lines suggested by Section 2. It is less clear, however, in what sense P is a physical interpretation of Z, the distinction between Z and P appearing to an outsider (this one, at any rate) to have nothing whatsoever to do with any relevant sense of the physical, however loose. The distinction between the mathematical and the physical would seem no longer to reflect any intrinsic features either of mathematical or physical objects, things, or processes, with physical a name given simply to the portions of a theory that are confused, poorly developed, largely intuitive, or simply a conceptual mess.

In quite another sense, the distinction between the mathematical and the physical is sometimes taken as a reflection of the fact that mathematical objects are quite typically general, and physical objects, specific, or particular. The theory of differential equations provides an example. The study of specific systems of equations may conveniently be denoted a part of theoretical physics; the study of generic differential equations, a part of mathematics. But plainly there is no ontological difference between these subjects, only a difference in the character of certain mathematical structures. And this, too, is a distinction internal to mathematics.

The project of determining a clear senses in which mathematics has an application beyond itself remains where it was, which is to say, unsatisfied.

12

"To present a theory is to specify a family of structures," Bas van Fraassen has written, [21 p. 64]15

its models; and secondly, to specify certain parts of those models (the empirical substructures) as candidates for the direct representation of observable phenomena. The structures which can be described in experimental and measurement reports we can call appearances: the theory is empirically adequate if it has some model such that all appearances are isomorphic to empirical substructures of that model.

Some definitions. A language L is a structure whose syntax has been suitably regimented and articulated—variables, constants, predicate symbols, quantifiers, marks of punctuation. A language in standard formulation has the right kind of regimentation. A model M = f> is an ordered pair consisting of a non-empty domain D and a function f. It is f that makes for an interpretation of the symbols of L in M. Predicate symbols are mapped to subsets of D; relation symbols to relations of corresponding rank on D. The general relationship of language and the world model theory expresses by means of the concept of satisfaction. This is a relationship that is purely abstract, perfectly austere. Formulas in L are satisfied in M, or not; sentences of L are true or false in M. Languages neither represent nor resemble their models. The scheme is simple.

There are no surprises.

Let T be a theory as logicians conceive things, a consistent set of sentences; and a theory furthermore that expresses some standard (purely) mathematical theory—the theory of linear differential operators, say. If T has empirical content, it must have empirical consequences—φ, for example:

12.1

T |– φ.

But equally, if T has empirical content, some set of sentences T(E) ⊆ T, must express its empirical assumptions. Otherwise, 12.1 would be inscrutable. Subtract T(E) from T. The sentences T(M) ⊆ T that remain are purely mathematical.

Plainly

12.2

T = T(M) ∪ T(E).

And plainly again

12.3

T(M) ∪ T(E) |– φ,

whence by the deduction theorem,

12.4

T(M) |– T(E) → φ.

The set of sentences Θ = T(E) → φ constitutes the empirical hull of T.

If model theory is the framework, the concept of a mathematical application resolves itself into a relationship between theory and model and so involve a special case of satisfaction. T(M) as a whole is satisfied in a set theoretic structure M; Θ, presumably, in another structure N, its domain consisting of physical objects or bodies. But given 12.4, Θ is also satisfied in any model N satisfying T(M). N is a model with a kind of hidden hum of real life arising from the elements in its domain. An application of mathematics, if it takes place at all, must take place in the connection between M and N. An explanation of this connection must involve two separate clauses, as in 12.5, which serves to give creditable sense to the notion of a mathematical application in the context of model theory:

12.5

T(M) applies to N if i) T(M) is satisfied in N; and, ii) N is a sub-model of M.

This leaves one relationship undefined. Sub-models, like sub-groups, are not simply substructures of a given structure. If N is a sub-model of M, the domain D' of N must be included in the domain D of M; but in addition:

i)

Every relation R' on D' must be the restriction to D' of the corresponding R on D; and

ii)

ditto for functions; and moreover,

iii)

every constant in D' must be the corresponding constant in D.

This definition has an undeniable appeal. The mathematical applications find their place within the antecedently understood relationship between theories and their models. This does not put mathematics directly in touch with the world, but with its proxies instead. The parts of the definition cohere, one with the other. It is obviously necessary that T(M) has empirical consequences. Otherwise there would be no reason to talk of applications whatsoever. It is necessary, too, that N be a sub-model of M; otherwise the connection between what a theory implies and the structures in which it holds would be broken. Finally, it is necessary that T(M) be satisfied in N as well as M; otherwise what sense might be given to the notion that T(M) applies to any empirical substructure of M at all? Those conditions having been met, a clear sense has been given to the concept of a mathematical application.

This is somewhat too optimistic. M, recall, is a mathematical model, and N a model that is not mathematical: the elements in its domain are physical objects. The assumption throughout is that in knowing what it is for a mathematical theory to be satisfied in M, the logician knows what it is for that same theory to be satisfied in N. In a purely formal sense, it must, that assumption, be true; the definition of satisfaction remains constant in both cases. What remains troubling is the question whether the conditions of the definition are ever met. The definition of satisfaction, recall, proceeds by accretion. A sentence S is satisfied in N under a given interpretation of its predicate symbols S[F,G,…,H]. The interpretation comes first. In the case of a pure first-order language, it is the predicate symbols that carry all of the mathematical content.

Were it antecedently clear that S[F,G, … ,H] admits of physical interpretations, why did we ever argue?

13

Under one circumstance, the question whether a mathematical theory is satisfied in a physical model may be settled in one full sweep. A theory T satisfied in every one of the sub-models of a model M of T is satisfied in particular in the empirical sub-models of M as well. It must be. Demonstrate that T is satisfied in every one of its sub-models and what remains, if 12.5 is to be justified, is the correlative demonstration that T is satisfied in a model containing empirical structures as sub-models. What gives pause is preservation itself.

The relevant definition:

13.1

A theory T is preserved under sub-models if and only if T is satisfied in any sub-model of a model of T.

In any sub-model, note. Preservation under sub-models is by no means a trivial property.

Given 13.1, it is obvious that preservation hinges on a sentence's quantifiers. A sentence is in prenex form if its quantifiers are in front of the matrix of the sentence; and universal if it is in prenex form and those quantifiers are universal. It is evident that every sentence may be put into prenex form.

Herewith a first theorem on preservation:

13.2

If φ is universal and N is a sub-model of M, then if φ is satisfied in M it is satisfied as well in N.

The proof is trivial.

13.1 takes a more general form in 13.3:

13.3

A sentence φ is preserved under sub-models if and only if φ is logically equivalent to a universal sentence.

Again, the proof is trivial.

13.3 is, in fact, a corollary to a still stronger preservation theorem, the only one of consequence. A theory T has a set of axioms A just in case A and T have the same consequences; those axioms are universal if each axiom is in prenex normal form, with only universal quantifiers figuring. A theory derivable from universal axioms is itself universal. Let T, as before, be a theory. What follows is the Los-Tarski theorem:

13.4

T is preserved under sub-models if and only if T is universal.

13.4, together with 12.1 mark a set of natural boundaries of sorts. 12.1 indicates what is involved in the concept of an application; 13.4 specifies which theories are apt to have any applications at all. The yield is discouraging. Group theory is not preserved under sub-models; neither is the theory of commutative rings, nor Peano arithmetic, nor Zermelo Fraenkel set theory, nor the theory of algebraically closed fields, nor almost anything else of much interest.

14

When left to his own devices, the mathematician, no less than anyone else, is apt to describe things in terms of a natural primitive vocabulary. Things are here or there, light or dark, good or bad. The application of mathematics to the world beyond involves a professional assumption. And one that is often frustrated. Sheep, it is worthwhile to recall, may be collected and then counted; not so plasma. Set theory, I suppose, marks the point at which a superstitious belief in the palpability of things gives way. Thereafter, the dominoes fall rapidly.

If some areas of experience seem at first to be resistant to mathematics, there is yet a doubled sense in which mathematics is inexpungable, a feature of every intellectual task. The idea that there is some arena in which things and their properties may be directly apprehended is incoherent. Any specification of the relevant arena must be by means of some theory or other; there is no describing without descriptions. But to specify a theory is to specify its models. And so mathematics buoyantly enters into areas from which it might have been excluded, if only for purposes of organization.

Mathematics makes its appearance in another more straightforward sense. Every intellectual activity involves a certain set of basic and ineliminable operations of which counting, sorting, and classification are the most obvious. These operations may have little by way of rich mathematical content, but at first cut they appear to be amenable to formal description. It is here that the empirical substructures that van Fraassen evokes come into play. Mention thus of empirical substructures is a mouthful; let me call them primitive models, instead, with primitive serving to emphasize their relative position on the bottom of the scheme of things, and models reestablishing a connection to model theory itself. The primitive models are thus a mathematical presence in virtually every discipline, both in virtue of their content—they are models, after all; and in virtue of their form—they deal with basic mathematical operations.

Patrick Suppes envisages the primitive models as doubly finite: Their domain is finite; so are all model-definable relations. Those relations are, moreover, qualitative in the sense that they answer to a series of yes or no questions asked of each object in the domain of definition [19]. This definition reflects the fact that in the end every chain of assertion, judgment, and justification ends in a qualitative declaration. There it is: The blotting paper is red, or it is not; the balance beam is to the right, or it is not; the rabbit is alive, or it is dead. But now a second step. A physical object is any object of experience; and objects of experience are those describable by primitive theories. Primitive theories are satisfied in primitive models.

The salient feature of a primitive model is a twofold renunciation: Only finitely many objects are considered; and each object is considered only in the light of a predicate that answers to a simple yes or no. Such are the primitive properties F 1, … , Fn. A primitive model may also be described as any collection C of primitive properties, together with their union, intersection, and complement.

There is yet another way of characterizing primitive theories and their models. A boolean-valued function f is one whose domain is the collection of all n-tuples of 0 and 1, and whose range is {0, 1}. Such functions are of use in switching and automata theory. Their structure makes them valuable as instruments by which qualitative judgments are made and then recorded. The range of a boolean-valued function corresponds to the simple yes or no (true or false) that are features of the primitive models; but equally, the domain corresponds either to qualitative properties or to collocations of such properties. A primitive theory, on this view, is identified with a series of Boolean equations; a primitive structure, with a Boolean algebra.

Set theory provides still another characterization of the primitive models, this time via the concept of a generic set. The generic sets are those that have only the members they are forced to have and no others. Forcing is atomistic and finite, the thing being done piecemeal. Thus suppose that L is a first-order language, with finitely many predicates but indefinitely many constants. By the extension L*(S 1, …, S n ) of L, I mean the language obtained by adjoining the predicate symbols S n to L. A basic sentence of L* has the form kS n or kS n . A finite and consistent set of basic sentences ξ constitutes a condition. A sequence of conditions ξ1, … , ξ n is complete if and only if its union is consistent, and, moreover, for any k and n, there exists an n such that kS n or kS n belongs to ξ n . A complete sequence of conditions determines an associated sequence of sets S 1, … S n :

 $$ k\in {S}_n\iff \left(\exists m\right)\left(k\in {S}_n\;\mathrm{belongs}\;\mathrm{t}\mathrm{o}\;{\upxi}_n\right). $$

The path from conditions to sets runs backwards as well as forwards.

Sets have been specified, and sequences of sets; conditions, and sequences of conditions. The model structure of L* is just the model M* = <D, f>, where f maps S 1, …, S n onto S 1, …, S n . There is a straightforward interpretation of the symbolic apparatus. The conditions thus correspond to those yes or no decisions that Suppes cites; that they are specified entirely in terms of some individual or other belonging to a set is evidence of the primacy of set formation in the scheme of things.

The specification of sets by means of their associated conditions is a matter akin to enumeration. A given set S is generic if in addition to the objects it has as a result of enumeration, it has only those objects as members it is forced to have. Forcing is thus a relationship between finite conditions and arbitrary sentences of L, the sentences in turn determining what is truly in various sets. The definition proceeds by induction on the length of symbols. What it says is less important than what it implies. Every sentence about a generic set is decidable by finitely many sentences of the form kS n or kS n . Finitely many, note, and of an atomic form.

Whatever the definition of primitivity, theories satisfied in primitive models admit of essential application. This is a definition:

14.1

T applies essentially to M if and only if M is primitive and T is satisfied in M.

Such theories apply directly to the world in the sense that no other theories apply more directly. Counting prevails, but only up to a finite point; the same for measurement. The operation of assigning things to sets is suitably represented; and in this sense one has an explanation of sorts for the universal feeling that numbers may be directly applied to things in a way that is not possible for groups or Witten functors. It is possible that the operation of assigning things to sets is the quintessential application of mathematics, the point that dizzyingly spins off all other points. But even if assigning things to sets is somehow primitive, the models that result are themselves abstract and mathematical.

The concept of a primitive model does not itself belong to model theory. The primitive models have been specified with purposes other than mathematics in mind. Nor is it, that concept, precisely defined, if only because so many slightly different structures present themselves as primitive. Nonetheless, the primitive models share at least one precisely defined model-theoretic property. A number of definitions must now be introduced:

Let L be a countable language and consider a theory T:

14.2

A formula φ(x 1, x 2, … , x n ) is complete in T if and only if for every other formula χ(x 1, x 2, … , x n ), either φ ⊃ χ or φ ⊃  χ holds in T; and

14.3

A formula θ(x 1, x 2, … , x n ) is completable in T if and only if there is a complete formula φ(x 1, x 2, … , x n ) such that φ ⊃ θ holds in T.

The definitions of complete and completeable formulas give rise, in turn, to the definitions of atomic theories and their models:

14.4

A theory T is atomic if and only if every formula of L consistent with T is completeable in T; and

14.5

A model M is atomic if and only if every relation in its domain satisfies a complete formula in the theory T of M.

It follows from these definitions that every finite model is atomic; it follows also that every model whose individuals are constant is again atomic. Thus a first connection between empirical substructures and model theory emerges as a trivial affirmation:

14.6

Every primitive model is atomic.

The proof is a matter of checking the various definitions of the primitive models, wherever they are clear enough to be checked.

The real interest of atomic models, however, lies elsewhere. The relationship of a model to its sub-models is fairly loose; not so the relationship of a model to its elementary sub-models. Consider two models, N and M, with domains D' and D. A first-order language L is presumed throughout:

14.7

The mapping f: D' → D is an elementary embedding of N into M if and only if for any formula φ(x 1, x 2, … , x n ) of L and n-tuples a 1, … , a n in D', φ[a 1, …, a n ] holds in N if and only if φ[fa 1, …, fa n ] holds in M.

Given 14.7, it follows that the target of f in M is a sub-model of M; the elementary sub-models are simply those that arise as the result of elementary embeddings. From the perspective of first-order logic, elementary sub-models and the models in which they are embedded are indiscernible: No first-order property distinguishes between them. The models are elementarily equivalent.

Another definition, the last. Let N be a model and T(N) its theory:

14.8

N is a prime model if and only if N is elementarily embedded in every model of T(N).

14.6 establishes trivially that every primitive model is atomic. It is also trivially true that primitive models are countable. But what now follows is a theorem of model theory:

14.9

If N is a countable atomic model then N is a prime model.

Assume that N is a countable atomic model; T(N) is its theory. Say that A = {a 0, a 1, … ,} constitutes a well-ordering of the elements in the domain of N. Assume that M is any model of T(N). Suppose that F is a complete formula satisfied by a 0. It follows that (∃x)F follows from T(N); it follows again that there is a b 0 among the well-ordered elements B of M that satisfies F. Continue in this manner, exhausting the elements in A. By 14.7, going from A to Bdefines an elementary embedding of N into M. The conclusion follows from 14.8.

14.6 establishes that the primitive models are among the atomic models; but given the very notion of a primitive model, it is obvious that any primitive model must be countable. It thus follows from 14.9 that

14.10

Every primitive model is prime.

In specifying a relationship between the primitive and the prime models, 14.10 draws a connection between concepts arising in the philosophy of science and concepts that are model-theoretic. There is a doubled sense in which 14.10 is especially welcome. It establishes the fact that the primitive models are somehow at the bottom of things, in virtue of 14.8 the smallest models available. And it provides a necessary condition for a theory to have empirical content. Recall van Fraassen's definition: “[A] theory is empirically adequate if it has some model such that all appearances are isomorphic to empirical substructures of that model.” A mathematical theory T has empirical content just in case T has a prime model.

Such is the good news. The bad news follows. 14.10 does little—it does nothing—to explain the relationship, if any, between those mathematical theories that are not primitive and those that are. As their name suggests, the primitive models are pretty primitive. The renunciations that go into their definition are considerable. There is thus no expectation that any mathematical theory beyond the most meager will be definable in terms of a primitive theory. Let us go, then, to the next best thing. Assume that most mathematical theories are satisfied in models with primitive sub-models. 14.6 might then suggest that such theories apply to elements in a primitive sub-model if they are satisfied in a primitive sub-model. But those mathematical theories that are not preserved under sub-models generally will not be preserved generally under primitive sub-models either. The primitive theories and their models are simply too primitive.

There is no next best thing.

15   Conclusion

The argument that mathematics has no application beyond itself satisfies an esthetic need: It reveals mathematics to be like the other sciences and so preserves a sense of the unity of intellectual inquiry. Like any argument propelled by the desire to keep the loose ends out of sight, this one is vulnerable to what analysts grimly call the return of the repressed. Mathematics may well be akin to zoology: Yet the laws of physics, it is necessary to acknowledge, mention groups, and not elephants. And mathematical theories in physics are strikingly successful. Alone among the sciences, they permit an uncanny epistemological coordination of the past, the present, and the future. If this is not evidence that in some large, some irrefragable sense, mathematical theories apply to the real world, it is difficult to know what better evidence there could be.

That mathematical objects exist is hardly in doubt. What else could be meant by saying that there exists a natural number between three and five? Where they exist is another matter. The mathematical Platonist is often said to assert that mathematical objects exist in a realm beyond space and time, but since this assertion involves a relationship that is itself both spatial and temporal, it is very hard to see how it could be made coherently. The idea that mathematical objects are the free creations of the human mind, as Einstein put it, is hardly an improvement. If the numbers are creations of the human mind, then it follows that without human minds, there are no numbers. In that case, what of the assertion that there is a natural number between three and five? It is true now; but at some time before the appearance of human beings on the earth, it must have been false. The proposition that there exists a natural number between three and five cannot be both true and false, and so it must be essentially indexical, its truth value changing over time. That Napoleon is alive is accordingly true during his life and false before and afterwards. But if the proposition that there exists a natural number between three and five is false at some time in the past, the laws of physics must have been false as well, since the laws of physics appeal directly to the properties of the natural numbers. If the laws of physics were once false, of what use is any physical retrodiction—any claim at all about the distant past? Perhaps then mathematical assertions are such that once true, they are always true? This is a strong claim. On the usual interpretation of modal logics, it means that if P is true, then it is true in every possible world. Possible worlds would seem no less Platonic than the least Platonic of mathematical objects, so the improvement that they confer is not very obvious.

Various accounts of mathematical truth and mathematical knowledge are in conflict. The truths of mathematics make reference to a domain of abstract objects; they are not within space and they are timeless. Contemporary theories of knowledge affirm that human agents can come to know what they know only as the result of a causal flick from the real world. It is empirical knowledge that is causally evoked. Objects that are beyond space and time can have no causal powers.

To the extent that mathematical physics is mathematical, it represents a form of knowledge that is not causally evoked. To the extent that mathematical physics is not causally evoked, it represents a form of knowledge that is not empirical. To the extent that mathematical physics represents a form of knowledge that is not empirical, it follows that the ultimate objects of experience are not physical either.

What, then, are they? As a physical subject matures, its ontology becomes progressively more mathematical, with the real world fading to an insubstantial point, a colored speck, and ultimately disappearing altogether. The objects that provoke a theory are replaced by the enduring objects that sustain the theory. Pedagogy recapitulates ontology. The objects treated in classical mechanics, to take a well-known example, are created by classical mechanics. Unlike the objects studied in biology, they have no antecedent conceptual existence. In V.I. Arnold's elegant tract, for example, a mechanical system of n points moving in three-dimensional Euclidean space is defined as a collection of n world lines. The world lines constitute a collection of differentiable mappings. Newton's law of motion is expressed as the differential equation x'' = F(x, x', t) [1].

Nothing more.

Mathematics is not applied to the physical world because it is not applied to anything beyond itself. This must mean that as it is studied, the physical world becomes mathematical.

References

1.

V.I. Arnold, Mathematical Methods of Classical Mechanics (New York: Springer Verlag, 1980), ch. 1.

2.

V.I. Arnold, Ordinary Differential Equations Cambridge, Massachusetts: The MIT Press, 1978

3.

M. Atiyah, The Geometry and Physics of Knots. New York: Cambridge University Press, 1990.

4.

W. Balzer, C. Moulines and J. Sneed, An Architectonic for Science .Dordrecht: D. Reidel, 1987.

5.

H. Behnke, F. Bachman, and H. Kunle, editors, Fundamentals of Mathematics. Cambridge, Massachusetts: The MIT Press, 1983.

6.

H. Boerner, Representation of Groups. New York: American Elsevier Publishing Company, 1970.

7.

T. Brody, The Philosophy behind Physics. New York: Springer-Verlag, 1993.

8.

C. Casacuberta and M. Castellet, eds., Mathematical Research Today and Tomorrow. New York: Springer-Verlag, 1991.

9.

J. Casti, Alternate Realities, New York: John Wiley, 1988.

10.

R. Courant and D. Hilbert, Methods of Mathematical Physics, Volume II. New York: John Wiley, 1972.

11.

S. Eilenberg, Automata, Languages, and Machines, Volume A (New York: Academic Press, 1974).

12.

A. Einstein, Geometrie und Erfahrugn, Berlin: Springer 1921.

13.

Marceau Feldman, Le Modèle Géometrique de la Physique Paris: Masson, 1992.

14.

V. Guillemin and S. Sternberg, Variations on a Theme by Kepler, Providence, R.I. American Mathematical Society, 1990.

15.

R. Larson, R. Hostetler, and B. Edwards, Calculus. Lexington, Massachusetts: D.C. Heath & Company, 1990.

16.

Charles Parsons, 1986. ‘Quine on the Philosophy of Mathematics,’ in Edwin Hahn and Paul Arthur Schilpp, The Philosophy of W.V. Quine. LaSalle, Illinois: Open Court Press.

17.

W.V.O Quine, “Reply to C. Parsons” in E. Hahn and P.A. Schilpp (eds.), The Philosophy of W.V. Quine. LaSalle Illinois: Open Court Press, 1986.

18.

J. T. Schwartz, ‘The Pernicious Influence of Mathematics on Science,’ in M. Kac, G.C. Rota, J. T. Schwartz (eds.) , Discrete Thoughts. Boston: Birkhäuser Boston, 1992, pp. 19–25.

19.

D. Scott and P. Suppes. "Foundational aspects of theories of measurement." Journal of Symbolic logic (1958): 113–128.

20.

W. Tung, Group Theory in Physics. Philadelphia: World Scientific, 1985.

21.

B. C. Van Fraassen, The Scientific Image. Oxford: The Clarendon Press, 1980.

22.

H. Weyl, The Classical Groups. Princeton: The Princeton University Press, 1946.

23.

H. Weyl, Symmetry. Princeton: The Princeton University Press, 1952.

Footnotes

1

Parson's phrase, the “instance of a structure” is not entirely happy. Predicates have instances; properties are exemplified; structures just sit there.

2

See [13 pp. 198–204] for interesting remarks.

3

By Euclidean geometry, I mean any axiomatic version of geometry essentially equivalent to Hilbert's original system—the one offered in chapter 6 of [5] for example.

4

Indeed, it is not clear at all that the surface of my desk is either a two- or a three-dimensional surface. If the curved sides of the top are counted as a part of the top of the desk, the surface is a three-dimensional manifold. What then of its rectangular shape? If the edges are excluded, where are the desk’s boundaries?

5

In a well-known passage [12] Albert Einstein remarked that to the extent that the laws of mathematics are certain, they do not refer to reality; and to the extent that they refer to reality, they are not certain. I do not think Einstein right, but I wonder whether he appreciated the devastating consequences of his own argument?

6

Quantum considerations, I would think, make it impossible to affirm any version of an Archimedian axiom for points on a physical line.

7

For very interesting if inconclusive remarks, see the round-table discussion by a collection of Field medallists in [8 pp. 88–108], especially the comments of Alain Connes on p. 95.

8

Defined as the ratio of two lengths, radians are in any case dimensionless units.

9

See, for example, [7 pp. 56–58].

10

Curiously enough, this is a point that Weyl himself appreciates [23 pp. 15–17].

11

See Eilenberg [11], from a philosophical point of view, interest in semigroups is considerable. A finite state automata constitutes the simplest model of a physical process. Associated to any finite state automata is its transition semigroup. Semigroups thus appear as the most basic algebraic objects by which change may abstractly be represented. Any process over a finite interval can, of course, be modeled by a finite state automata; but physical laws require differential equations. Associated to differential equations are groups, not semigroups. This is a fact of some importance, and one that is largely mysterious.

12

This familiar argument has more content than might be supposed. It is, of course, a fact that quantitative measurements are approximate; physical predicates are thus inexact. For reasons that are anything but clear, quantitative measurements do not figure in mathematics; mathematical predicates are thus exact. It follows that mathematical theories typically are unstable. If a figure D just misses being a triangle, no truth strictly about triangles applies to D. Mathematical theories are sensitive to their initial descriptions. This is not typically true of physical theories. To complicate matters still further, I might observe that no mathematical theory is capable fully of expressing the conditions governing the application of its predicates. It is thus not a theorem of Euclidean geometry that the sum of the angles of a triangle is precisely 180 degrees; ‘precisely’ is not a geometric term. For interesting remarks, see [18].

13

For a more general account, see [6]. For a (somewhat confusing) discussion of the role of groups in physics, see [14].

14

The notion of a solution to a differential equation is by no means free of difficulties. Consider a function f(x) = Ax, and consider, too, a tap of the sort that sends f to g(x) = Ax + μx, where μ is small. Do f and g represent two functions or only one?

15

See also [4] for a very detailed treatment of similar themes.