THE FORMALISM OF PROBABILITY - THE MATHEMATICS OF RANDOMNESS - The Remarkable Role of Evolution in the Making of Mathematics

Mathematics and the Real World: The Remarkable Role of Evolution in the Making of Mathematics (2014)

CHAPTER V. THE MATHEMATICS OF RANDOMNESS

41. THE FORMALISM OF PROBABILITY

The mathematical developments and increasing uses of the concepts of probability theory and statistical methods resulted in the accumulation of great expertise in the practice of the mathematical theory at the beginning of the twentieth century. This development, however, was accompanied by much unease, the roots of which were mentioned previously. First, there was the duality in the subject matter. The same terms and considerations were used both in the analysis of repeated events, in which the probability can be interpreted as the proportion of total occurrences in which the event takes place, as well as in cases of assessing the probability of a non-repeated event. Second, no understanding or agreement had been reached regarding the source of the probabilities. Even in coin-flipping experiments, the only reason for thinking that both sides of the coin had equal chances of falling uppermost was that there was no reason for thinking that the chances were not equal. Is that argument strong enough to convince us that the calculations represent nature? In addition, there was no general logical mathematical framework for dealing with the mathematics of randomness. For example, no one had proposed a precise general definition of the concept of independence. The reader will no doubt have noticed that we have used the term independent several times, and the intuitive feeling is that even without a formal definition, we know when events are independent. That feeling, however, is not enough for mathematical analysis, and a definition that met the strict criteria of mathematics did not exist.

George Boole (1815–1864), a British mathematician and philosopher, tried to present a general mathematical framework. He claimed that mathematical logic, and in particular the union and intersection of sets to present information, is appropriate for the analysis of events involving probabilities. To this end Boole constructed the basis of logic through the use of sets and defined what is today known as Boolean algebra. These efforts did not result in much success, however, among other reasons because Boole's works contained discrepancies that resulted from a lack of consistency in the model he used. For example, Boole related in different and conflicting ways to the concept of independence. In one case independence meant the inability to imply a conclusion from one event to another, and in another case it meant that events do not overlap. Thus, at the beginning of the twentieth century, the mathematics of randomness did not provide a satisfactory answer regarding how to analyze events involving probabilities and the sources from which those probabilities originated.

It was Andrey Kolmogorov (1903–1987) who proposed the complete framework of logic. Kolmogorov was a preeminent mathematician of the twentieth century. In addition to his contribution to mathematical research, he was interested in the teaching of mathematics in schools and held various administrative positions in universities and Russian academia. Kolmogorov made important contributions in a wide range of mathematical subjects: Fourier series, set theory, logic, fluid mechanics, turbulence, analysis of complexity, and probability theory, which we will turn to shortly. He was granted many awards and honors, including the Stalin Prize, the Lenin Prize, and, in 1980, the prestigious Wolf Prize, the award ceremony of which he did not attend. This led to a change in the rules of the prize, so that in order to be awarded the prize, the recipient must attend the ceremony.

Kolmogorov adopted the Greeks’ approach. He drew up a list of axioms with which the concepts that had previously been used only intuitively could be explained. We will discuss the connection between the axioms and nature after we have presented them. Kolmogorov's general approach adopted George Boole's proposal from several decades earlier, that is, the use of logic operators on sets to describe probabilities. The axioms that Kolmogorov wrote in his book in 1933 are quite simple and are set out below (they can be followed even without previous mathematical knowledge, but even if they are skipped, the text that follows can still be understood).

1. We choose a sample space, which we will call Ω. This is an arbitrary set whose members are called trials or samples.

2. We select a collection of sets, all of which are partial sets of the sample space Ω. We will denote this collection of sets Σ, and the sets within it we will call events. The family of sets Σ has several properties: the set Ω is within it (i.e., Ω is an event). If a sequence of sets (i.e., events) is within it, then the union of these events is also within it. If an event is in the collection, also its complement, that is, Ω minus the event, is an event.

3. For the collection of events we will define a probability function, which we will denote P. This assigns to each event a number between 0 and 1 (called the probability of the event). This function has the property that the probability of the union of a sequence of events that are pairwise disjoint is the sum of the probability of the individual events. Also, the probability of event Ω is 1.

For those unfamiliar with the jargon or terminology of mathematics, we will state that two events (two sets) are disjoint if there is no trial (member in Ω) that is in both events. The union of two sets is the set that includes the members of both sets. Thus, the second axiom says, among other things, that the set that contains the trials in both events is itself an event.

There is a reason for the statement that the collection of sets Σ of the events does not necessarily contain all the partial sets of the sample space Ω. The reason is essentially technical, and there is no need to understand it in order to follow the rest of the explanation. (The reason is that when Σ consists of all the subsets, it may be impossible, when the sample set is infinite, to find a probability function that fulfills the requirement of the third axiom.)

One of the innovative features of the axioms is that they ignore the question of how the probabilities are created. The axioms assume that the probabilities exist and merely requires that they have certain properties that common sense indicates. Following the Greek method, when you try to analyze a certain situation, you must identify the sample space that satisfies the axioms and describes the situation. If your identification is accurate, you can continue, and with the help of mathematics you can arrive at correct conclusions. Kolmogorov went further than the Greeks, however. They claimed that the “right” axioms were determined according to the state of nature. Kolmogorov allows completely different spaces to be constructed for the same probability scenario. An example follows.

The framework defined by the system of axioms enables a proper mathematical analysis to be performed. For instance, we wish to calculate the probability of an event B in a sample space that includes only partial events of event A, which has a probability of P(A). This new probability of B will be equal to the probability of that part of B that is in common with A (we are concerned only with that part of B) divided by the probability that A occurs. This can be written as the following formula. Denote the part that is common to A and B by B∩A, called A intersect B. Then the probability of the partial event of B that is in A is . This is called the conditional probability. The two events are independent if it is impossible to draw any conclusions regarding the existence of the second event from the existence of one of them, even a probabilistic conclusion. The mathematical formulation of independence states that the updated probability of B equals its original probability, or P(B∩A) = P(A)P(B). We have obtained a mathematical definition of independence. The same can be done with respect to other concepts used in probability theory.

This is an appropriate place for a warning: Many texts refer to the expression for conditional probability as the probability of B given A. This in turn leads to the interpretation of the conditional probability as the updated probability of B when one is informed that A has occurred. Such an interpretation may lead, as we shall see later, to errors when applying the formulae. While in plain language the two expressions, given and informed, are not that different, in applications, when we are informed of an event, the circumstances in which the information is revealed should be taken into account. When we are informed that A has occurred, we can by no means automatically conclude that the conditional probability of B given A depicts the updated probability of B.

And now we present, as promised, the formula for Bayes's theorem (this can be skipped without rendering the text that follows it less understandable). Assume that we know that event A has occurred, and we wish to learn from that the chances that event B will occur. For the sake of the example we assume the conditional probability, which we denote as P(B│A), describes the desired probability of B when we know that A has occurred. Bayes's formula as we described it verbally in the previous section is

Moreover, as we explained above, P(A│B) is P(B∩A) divided by P(B). (If we wish to conform with the wording of the principle as displayed in the previous section, we should write the denominator as P(A│B)P(B) + P(A│~B)P(~B) where ~B indicates the event B does not occur. (This is the way most texts write it.) Does that sound complicated? Perhaps so, but the framework provides a proper mathematical basis for the analysis of randomness.

Notice the assumption we made: The circumstances are such that P(B│A) is the correct updated probability. Otherwise we should resort to the original Bayes's scheme as described in the previous section, namely, we should calculate the ratio of the probability that A has occurred when we are informed that B has occurred to the entire probability that we are informed that A has occurred. In many applications the assumption does not hold, that is, the probability that we are informed that A has occurred is not P(A).

The above framework provides an outline for the construction of probabilities, but the events that appear in the axioms do not necessarily have significance in reality, significance that we can identify or calculate. Take as an example one toss of a coin. The sample space may be made up of two symbols, say a and b, with equal probabilities. If we declare that if a occurs this means (note that this is our explanation!) that the coin falls with heads showing, and if bcomes up in the sample it means that the coin fell with tails uppermost, we have a model for one flip of the coin. We cannot analyze two consecutive tosses of the coin in the framework of this sample space because there are four possible outcomes of two flips of the coin. For that case we have to construct another sample space. To arrive at a model that permits multiple flips of the coin, the sample space has to increase. To enable any number of tosses of the coin, we will require an infinite sample space. The technical details will interest those who deal with the mathematics (and students), and we will not present them here. We will just say that in a sample space in which an infinite series of flips of the coin can take place, events occur the probability of which is zero.

This is certainly intuitive. A ball spinning on a continuous circle stops at a point. The chances of its stopping on a predetermined point is zero, but the chance of its stopping on a collection of points, for example, on a complete segment, is not zero. With this end in view, Kolmogorov used mathematics that had been developed for other purposes and that explained how it was possible for a section to have a length while it consists of points the length of each of which is zero; the explanation had not been available to the Greeks when they encountered a similar problem. Furthermore, Kolmogorov's model can be used to explain and prove Bernoulli's weak law of large numbers (see the previous section), and even to formulate and prove a stronger law, as follows. We will perform a series of flips of a coin. These can create many series of results. We will examine those series of outcomes in which the proportion of the number of heads to the total number of throws does not approach 50 percent as the number of throws increases. This set of series, says the strong law of large numbers, has a probability of zero. (The careful reader whose mathematical education included Kolmogorov's theory will have noticed that although that event has zero probability, it can nevertheless occur. Indeed, there could be samples in which the proportion does not approach a half, but these are negligible.)

Another aspect of Kolmogorov's axioms is that it gives a seal of approval to the use of the same mathematics for both types of probability, that is, probability in the sense of the frequency of the outcomes of many repeats, and the probability in the sense of assessing the likelihood of a non-repeated event. Both aspects of this duality are described by the same axioms. Indeed, another look at the three axioms above will show that common sense will accept both interpretations of probability. As the mathematics is based solely on the axioms, the same mathematics serves for both cases.

What then does assessing the probability of a non-repeated event mean? The mathematical answer is given in the axioms and their derivatives. The day-to-day implications are a matter of interpretation, which is likely to be subjective. It is interesting that late in life Kolmogorov himself expressed doubts about the interpretation of the probability theory related to non-repeated events, but he did not manage to propose another mathematical theory to be used to analyze this aspect of probability.

Kolmogorov's book changed the way mathematics dealt with randomness. Concepts that had been considered just intuition became subject to clear mathematical definition and analysis, and theorems whose proofs had also relied on intuition were now proved rigorously. Within a short time Kolmogorov's model became the accepted model for the whole mathematical community. Nevertheless, as the reader who has not previously come across this mathematics can guess, the method Kolmogorov suggested was not easy to use. Moreover, the formalism failed to overcome the many difficulties and errors in the intuitive approach to randomness, because it is a logical formalism that the human brain is not set up to accept intuitively.