THE MATHEMATICS OF LEARNING FROM EXPERIENCE - THE MATHEMATICS OF RANDOMNESS - The Remarkable Role of Evolution in the Making of Mathematics

Mathematics and the Real World: The Remarkable Role of Evolution in the Making of Mathematics (2014)

CHAPTER V. THE MATHEMATICS OF RANDOMNESS

40. THE MATHEMATICS OF LEARNING FROM EXPERIENCE

Let us go back to a development that started in the eighteenth century that had an aura of logic about it and is therefore “responsible” for serious errors in the application of the theory. The statistical methods described in the previous section help us when the particular probabilities are known, and we need only to calculate the chance of a specific event occurring or, when it is possible, to estimate the statistical parameters. The technique does not teach us how to improve the assessments when new information is supplied to us. It was de Moivre, in his book on methods of dealing with randomness, who asked the question of how to act when new information is added. It was Thomas Bayes who answered the question, and the system whose fundamentals he put forward is known as the Bayesian method.

Thomas Bayes (1702–1761) was born in England but studied mathematics and theology at the University of Edinburgh, Scotland. He was more interested in theology and, following in the footsteps of his father, who was a Presbyterian minister in London, served in a similar capacity in the Mount Zion Chapel in Tunbridge Wells, Kent, England. In his lifetime he published only two works. One was on religious matters. The other was an attempt to defend Newton's approach to infinitesimal calculus against the severe attacks that claimed that fluxion had no logical basis. The criticism was published by the famous Irish philosopher Bishop George Berkeley (after whom the University of California, Berkeley, is named). Bayes did not see it fit to publish his formula in his lifetime, and it was published only after his death by his friend Richard Price, who was bequeathed Bayes's writings, and who realized the importance of that work.

Bayes's formula is very easy to understand technically but is also very difficult to absorb and apply intuitively. We will discuss the reasons for this and its sometimes-serious results in the coming sections. Here we will just present and explain the Bayesian formula itself (the calculations can be skipped without disturbing the overall picture).

We will start with an example based on a question that was asked in the school matriculation examination in probability in Israel in 2010. Of three boxes, the first contains two silver coins, the second has one silver coin and one gold coin, and the third holds two gold coins. One of the boxes is chosen at random, and one of the coins in it is chosen at random. The simple question is, what are the chances that the coin left in the box is a silver one? For reasons of symmetry one could conclude that the chance is 50 percent because the question does not differentiate between the roles of the two types of coin. That was the answer that would have been given before the revolution of Fermat and Pascal (without using the concept of “chance,” which did not exist then). One can also carry out the following calculation: Each box has a probability of one-third of being selected. If the first is chosen, the probability of the coin that is left in the box being silver is one (i.e., a certainty). If the second box is chosen and one coin is chosen randomly, the chances that the one left is silver is one-half. If the third box is chosen, the remaining coin after the choice of one is not silver. Now calculate , and we obtain that the chance is 50 percent.

images

Now we ask a more complex question. A coin is removed from a box that was chosen randomly, and it turns out to be a gold coin. What are the chances that the coin left in the box is silver? It is simple to formulate the question, but try giving an intuitive answer (without having to use formulae that you may have learned in lessons on probability). A simple analysis shows that from the information that a gold coin was taken from the box we can conclude that the box chosen was not the first (which held two silver coins). The other two boxes have an equal chance of being selected, that is, 50 percent. If the second box is chosen, the remaining coin will be the silver one (as the gold coin was taken out). If the third box is selected, the remaining coin will be the second gold one in that box. Thus, in the situation as described with a gold coin removed from a randomly selected box, the probability of the remaining coin being silver is one-half. Although this analysis is simple, it is incorrect (there was a reason that de Moivre did not arrive at a satisfactory answer to the question of how to solve such problems and left it in his book as an open question). The error is similar to the one committed by Pascal in his first letter to Fermat, as described in a previous section. In other words, the “solution” ignores the probabilities of a gold coin being selected in the different scenarios and therefore is in error in the precise implication it draws from the information. The correct analysis is: the gold coin that was drawn from the selected box came either from the second box (with a gold and a silver coin) with a probability of , or from the third box (which held two gold coins) with a probability of . Only in the first of these two possibilities will the remaining coin be silver. The weight of that happening in the number of occurrences in which a gold coin is selected in the first draw is divided by , that is, one-third.

The principle underlying the above calculation is simple. If you wish to draw a conclusion based on new information, you have to take into account all the factors that are likely to result in that information reaching you, and to weight all those factors according to their probabilities. Specifically, relating to our above example, assume that you want to find the probability that event B will occur, given that you are told that A has occurred. First, find the chance that you will be told that A has occurred if B occurs. Then find the chance that you will be told that A occurred if B does not occur. Then calculate the weight of being told that A occurred if B occurred relative to the total chance of being told that A occurred. This scheme can be written in the form of a formula, which we will set out in the next section. The principle underlying the weighting is the essence of the Bayes's scheme. We will present several other examples that will make the situation even clearer.

The principle presented by Bayes enables the probabilities to be updated whenever new information is received. Theoretically the probabilities could be updated continuously until exact assessments are obtained. That refinement of Bayes's scheme was developed by Laplace. Laplace apparently arrived independently at a formula similar to Bayes's and then developed the complete formulae of updates that become more and more exact, but when he heard of Bayes's previous work, he gave Bayes's name to the method. That name, the Bayesian inference or Bayesian statistics, has prevailed still today.

But the approach has a fundamental drawback. In order to apply Bayes's formula we need to know the probabilities that the events we are referring to will take place. The problem is that generally, in our daily lives, the information regarding these probabilities is not known. How then can we learn from experience? Bayes had a controversial answer: if you have no idea of the probability that A will occur or will not occur, assume that the chances are equal. Once you assume the initial probabilities, also called the a priori probabilities, you can calculate the new probability, called the a posteriori probability, with a high degree of accuracy. The question arises: Can we allow the use of an arbitrary assumption about the values of an a priori probability?

The dispute between the supporters and the opponents of the system was not limited by space or time. The statistics of frequencies and samples had a firm theoretical basis, but to use it required very many repetitions of the same random occurrence. This type of statistics was not applicable to statistical assessments of non-repeated events. Bayesian statistics is a tool for analyzing isolated events, but without reliable information on a priori probabilities, the results depend on subjective assessments, and these do not constitute a reliable basis for scientific findings, its opponents claimed. It is better to rely on subjective assessments than to ignore the advantages offered by the method, replied its supporters. Moreover, they added, the more information that is added, the greater the reduction in the effect of the arbitrary assumptions, until it reaches a minimum, a fact that gives scientific validity to the Bayesian approach. This dispute spilled over onto a personal level, and for many years the two methods developed side by side. Even today statisticians are divided into Bayesians and non-Bayesians, but it now seems that the borders and limitations of the two methods have been drawn more clearly, and each occupies its proper position.