Mathematics and Biology - Mathematics of Life

Mathematics of Life (2011)

Chapter 1. Mathematics and Biology

Biology used to be about plants, animals and insects, but five great revolutions have changed the way scientists think about life.

A sixth is on its way.

The first five revolutions were the invention of the microscope, the systematic classification of the planet’s living creatures, the theory of evolution, the discovery of the gene, and the discovery of the structure of DNA. Let’s look at them in turn, before moving on to my sixth, more contentious, revolution.

The Microscope

The first biological revolution happened 300 years ago, when the invention of the microscope opened our eyes to the astonishing complexity of life on the smallest scales. More precisely, it opened up the complexity of life to observation by our eyes, by providing a new instrument to augment our unaided senses.

The invention of the microscope led to the discovery that individual organisms have an amazing internal complexity. One of the first big surprises was that living creatures are made from cells – tiny bags of chemicals enclosed in a membrane that lets some of the chemicals pass in or out. Some organisms consist of a single cell, but even those are surprisingly complicated, because a cell is an entire chemical system, not something simple and straightforward. Many organisms are made from a gigantic number of cells: your body contains roughly 75 trillion of them. Each cell is a tiny biological machine with its own genetic machinery which can cause it to reproduce, or die. Cells come in more than 200 types – muscle cells, nerve cells, blood cells, and so on.

Cells were discovered very soon after microscopes were invented: once you can look at an organism under high magnification, you can’t miss them.

Classification

The second revlution was started by Carl Linnaeus, a Swedish botanist, doctor and zoologist. In 1735, his epic work Systema Naturae appeared. Its full title in English is ‘The system of nature through the three kingdoms of nature, according to classes, orders, genera and species, with characters, differences, synonyms, places’. Linnaeus was so interested in the natural world that he decided it needed to be catalogued. All of it. The first edition of his catalogue was just 11 pages long; the 13th and last ran to 3,000 pages. Linnaeus made it clear that he was not trying to uncover some kind of hidden natural order; he was just trying to organise what was there, in a systematic and structured manner. His chosen structure was to classify natural objects in a five-stage subdivision: kingdom, class, order, genus, species. His three kingdoms were animals, plants and minerals. He founded the science of taxonomy: the classification of living creatures into related groups.

Minerals are no longer classified along Linnaean lines, and the details of his system have been modified for plants and animals. Recently several alternative systems have been advocated, but none has yet been widely adopted. Linnaeus appreciated that a systematic classification of living things is vital to science, and he put that idea into practice. He made the occasional mistake: initially he classified whales as fish. But by the 10th edition of Systema Naturae, published in 1758, an ichthyologist friend had put him right, and whales were mammals.

The best-known and most useful feature of the Linnaean system is the use of double-barrelled names such as Homo sapiens, Felis catus, Turdus merula and Quercus robur – species of human, cat, blackbird and an oak tree.1 The importance of classification is not just to make a list, or to introduce fancy Latinised names to show how clever you are, but to make logical, clear-cut distinctions among the many creatures that exist. Common names, such as

‘blackbird’, don’t do the trick: do you mean the common blackbird, the grey-winged blackbird, the Indian blackbird, the Tibetan blackbird, the white-collared blackbird, or one of the 26 species of New World blackbird? But the Linnaean Turdus merula refers uniquely to the common blackbird, and there’s no chance of confusion.

Evolution

The third revolution had been simmering for some time, but it boiled over in 1859 when Darwin published The Origin of Species. The book eventually ran to six editions, and it ranks as one of the truly great scientific works of all time, bearing comparison with the works of Galileo, Copernicus, Newton and Einstein in the physical sciences. In the Origin, Darwin proposed a new vision of the source of life’s diversity.

The prevailing belief in his day, among scientists as much as lay folk, was that each separate species had been created individually by God as part of the overall act of creating the universe. In this view, species could not change over time: a sheep was, is and always will be a sheep; a dog was, is and always will be a dog. But as Darwin contemplated the scientific evidence, much of which he had amassed on his own travels, he found this comfortable picture becoming less and less tenable.

Pigeon fanciers knew that deliberate breeding could produce wildly different types of pigeon. The same went for cows, dogs and indeed all domesticated animals. Now, that mechanism for change required human intervention. The animals didn’t change ‘of their own accord’: they had to be chosen – selected – with great care, by someone following a plan. But Darwin realised that unaided nature could, in principle, produce similar changes through competition for resources. When times were hard, those animals that were better able to survive would be the ones that lived long enough to produce the next generation, and this new generation would be slightly better adapted to the environment.

Such changes, Darwin felt, would be much more gradual than those imposed by human breeders, but a changing environment could, over a long period of time, cause some of the organisms in a species to develop markedly different forms and habits. He saw this process as the slow accumulation of myriad tiny changes. His background in geology made him acutely aware that the planet had been around for vast aeons of time, so lack of time was not a problem. Even extraordinarily slow changes could eventually become very significant.

He called this process ‘natural selection’. Today we call it ‘evolution’, a word that Darwin didn’t use – although the final word in The Origin of Species is ‘evolved’. The evidence in favour of evolution is so extensive, and comes from so many independent sources, that biology now makes no sense without it. Today, almost all biologists (and most scientists, whatever their field of research) find the evidence that evolution has been the dominant mechanism behind the diversity of today’s species to be overwhelming. But how evolution works is another matter entirely, and much remains to be understood.

Genetics

The fourth revolution was Gregor Mendel’s discovery of genes, which was published in 1865 but not appreciated for another fifty years.

Observable features of organisms, such as colour, size, texture and shape, are known as characters (or characteristics or traits). Darwin had no idea how characters were transmitted from parent to offspring, though several distinct lines of reasoning led him to infer that this must happen. In fact, the transmission mechanism was already under investigation when he wrote the Origin, but he didn’t know that. It would have had a major impact on his thinking.

For seven years around 1860, the Austrian priest Gregor Mendel bred pea plants – 29,000 of them – and counted how many displayed particular characters in each generation. Did they produce yellow or green peas? Were the peas smooth or wrinkly? Mendel’s observations turned up some curious mathematical patterns, and he became convinced that inside every living organism there are ‘factors’, now called genes, that somehow determine many features of the organism itself. These factors are inherited from previous generations, and in sexual species they arise in pairs: one from the ‘father’ (the male organ of the plant) and one from the ‘mother’ (the female organ). Each factor can occur in several distinct forms. The random mixing of these ‘alleles’ – genetic alternatives – creates the patterns in the numbers.

Initially, the physical form of Mendel’s factors was a complete mystery; their existence was inferred indirectly from the mathematical patterns – the proportions of plants in successive generations that possessed particular combinations of features.

The structure of DNA

Revolution number five was more straightforward, and like the first, it was triggered by the invention of a new experimental technique. This time the technique was X-ray diffraction, which allows biochemists to work out the structure of complex, biologically important molecules. In effect, it provides a ‘microscope’ that can reveal the positions of individual atoms in a molecule.

In the 1950s Francis Crick and James Watson began to think about the structure of a complex molecule found almost universally in living creatures: deoxyribose nucleic acid, known universally by its initials, DNA. Crick, who was British, had trained as a physicist, but became terminally bored while writing a PhD on how to measure the viscosity of water at high temperatures, and in 1947 he moved into biochemistry. Watson was an American whose first degree was in zoology; he became interested in a type of virus that infects bacteria, known as a bacteriophage (‘bacterium-eater’). His big project was to understand the physical nature of the gene – its molecular structure.

At that time, it was known that genes resided in regions of the cell called chromosomes, and that the main constituents of genes were proteins and DNA. The conventional wisdom among biologists was that organisms could reproduce because the genes were proteins, capable of copying themselves. DNA, in contrast, was widely considered to be a ‘stupid tetranucleotide’ whose sole function was to act as scaffolding, so that the proteins could be held together.

However, there was already some evidence that DNA is the molecule from which genes are formed, which immediately raised a crucial question: what does the DNA molecule look like? How are its component atoms arranged?

Watson ended up working with Crick. They based their analysis of DNA on some crucial X-ray diffraction experiments carried out by others (notably Maurice Wilkins and Rosalind Franklin), homed in on a few key facts, and started building models in the literal sense, by fitting together pieces of card or metal shaped like simple molecules that were known to be part of DNA. This exercise led them to propose the now-famous double helix structure: DNA is two-stranded, like two intertwined spiral staircases. Each strand (staircase) carries a series of bases, which are four different molecules: adenine (A), cytosine (C), guanine (G) and thymine (T). These come in linked pairs: an A on one strand is always joined to a T on the other; a C on one strand is always joined to aG on the other.

Crick and Watson published their proposal in the scientific journal Nature in 1953. It begins: ‘We wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A.). This structure has novel features which are of considerable biological interest.’ Near the end, they write: ‘It has not escaped our notice that the specific pairing we have postulated [A with T, C with G] immediately suggests a possible copying mechanism for the genetic material.’2

The basic idea here is simple: the sequence of bases on just one of the two strands determines the entire structure. On the other strand, the sequence is given by the complementary bases to those on the first strand – swap A and T, and swap C and G. If you could pull DNA apart into its two strands, each of them would contain the necessary ‘information’ to reconstruct the other. So all you have to do is make two complementary strands, and fit the pairs back together to get two perfect copies of the original.

Crick and Watson’s suggestion for the structure of DNA, based on little more than some crucial hints from experiment and a lot of fiddling with models, turned out to be correct. So did the copying mechanism, which was so speculative that they did not spell it out in the Nature paper in case it turned out to be wrong. However, you can’t just pull two intertwined helices apart, so some quite complicated mechanisms are needed to achieve this duplication. What they were lay far in the future.

At a stroke, attention in biology turned to the molecular structure of key substances: DNA, proteins and associated molecules. University biology departments fired or retired botanists, zoologists and taxonomists – anyone who actually worked with entire animals was completely out of date. Molecules were the coming thing. And they were, and they did. And biology has never been the same since. Crick and Watson had found ‘the secret of life’, as Crick bragged in the Eagle (a pub in Benet Street, Cambridge) a few days before they found the correct structure.

Many major new developments have followed from Crick and Watson’s breakthrough. The science behind them is often highly innovative, but the point of view has changed only incrementally from what it was in Crick and Watson’s day, so these more recent scientific advances, dramatic though they may be, do not constitute genuine revolutions. For example, in 2006 the Human Genome Project succeeded in listing the entire genetic sequence of a human being – three billion units of genetic information.3 This has revolutionary implications: for one thing, it opens up entirely new advances in medicine. Biology has become the most exciting scientific frontier of the twenty-first century, promising huge advances in medicine and agriculture, as well as a deep understanding of the nature of life itself. But there is a clear path linking all of this to the original discovery of the structure of DNA.

These, then, are my five revolutions.

The gaps between them, allowing (in Mendel’s case) for the time it took before anyone noticed, are roughly 50, 100, 50 and 50 years. The fifth happened just over 50 years ago. The pace of change in the world is accelerating, so a sixth revolution in biology seems overdue. I believe that it has already arrived. The nature of life is not just a question for biochemistry – many other areas of science have major roles in explaining what makes living creatures live. What unites them all, opening up entirely new vistas, is my sixth biological revolution: mathematics.

Mathematics has been with us for thousands of years; the ancient Babylonians could solve quadratic equations 4,000 years ago. Biologists have been using mathematical techniques, especially statistics, for more than a century. So it might seem unreasonable to refer to a ‘revolution’. But what I have in mind – what is happening as I write – goes much further. The mathematical way of thinking is becoming a standard piece of kit in the biological toolbox: not just a way to analyse data about living creatures, but a method for understanding them.

What mathematics is, and how useful it is, are widely misunderstood. It is not solely about numbers, ‘doing sums’ as we were taught in school – that’s arithmetic. Even when you add in algebra, trigonometry, geometry and various more modern topics such as matrices, what we learn at school is a tiny, limited part of a vast enterprise. To call it one-tenth of one per cent would be generous. And the mathematics we learn at school is in many ways unrepresentative of the whole, just as playing scales on a piano falls short of being real music, and woefully short of composing music. People often think that mathematics was all invented (or discovered) long ago, but new mathematics is coming into being at an impressive rate. A million pages a year is a conservative estimate, and that’s a million pages of new ideas, not just variations on routine calculations.

Numbers are basic to mathematics, just as scales are basic to music, but the subject matter of mathematics is much broader: shapes, logic, processes . . . anything that has structure or pattern. We can also include uncertainty, which might seem to be the absence of pattern, but the early statisticians discovered that even random events have their own patterns, on average and in the long run. One of the remarkable features of the mathematics now being used in biology is its variety; another is its novelty. Much of it is less than 50 years old and some of it was invented last week. It ranges from knot theory to game theory, from differential equations to symmetry groups. A lot of it uses ideas that most of us have never encountered, and probably wouldn’t recognise as mathematics if we did. It is changing how we think about biology, not just the results we obtain.

This approach is old hat in the physical sciences, which rely heavily on mathematics; in fact, the development of those two areas has gone hand in hand for thousands of years. Until recently, biology was – or seemed – different. Traditionally, biology was the branch of science recommended to students who preferred to avoid mathematics if at all possible. You can study the life cycle of a butterfly without doing any sums. There are still no fundamental mathematical equations for biology, equivalents of Newton’s law of gravitation. We don’t calculate the evolutionary trajectory of a fish by applying Darwin’s equations. But there is mathematics aplenty in today’s biology, and it is becoming ever harder to avoid it. It just doesn’t mimic the way mathematics is used in physics. It’s different, it has its own special quality. And increasingly, much of it is motivated by the needs of biologists, which are no longer as cosy as watching butterflies.

The application of mathematics to biology depends on new apparatus, most obviously the computer. It also depends on new mental apparatus: mathematical techniques, some specially tailored to the needs of biology, others that arose for different reasons but turn out to have important biological implications. Mathematics provides a new point of view, addressing not just the ingredients for life, but the processes that use those ingredients.

I believe that the sixth revolution in biology is already under way, and it is to apply mathematical insight to biological processes. My aim here is to show how the techniques and viewpoints of mathematics are helping us to understand not just what life is made from, but how it works, on every scale from molecules to the entire planet – and possibly beyond.

Until recently, most biologists doubted that mathematics would ever have much to tell us about life. Living creatures seemed too versatile, too flexible, to conform to any rigid mathematical formalism (hence the Harvard law of animal behaviour: ‘experimental animals, under carefully controlled laboratory conditions, do what they damned well please’). Mathematical tools such as statistics had their place, of course, but mathematics was purely a servant, unlikely to have a significant effect on mainstream biological thinking. Mavericks such as D’Arcy Wentworth Thompson, whose book On Growth and Form catalogued numerous mathematical patterns – or alleged patterns – in living creatures, were ignored or dismissed. They were at best a sideshow, at worst, nonsense. After all, Thompson’s book was first published in 1917, forty years before the structure of DNA became known, and he said very little about evolution, except to criticise what he saw as a tendency to fit the story to whatever facts happened to be available. More recent critics of a narrow molecular view of biology, such as the American evolutionary biologist Richard Lewontin, also got short shrift from mainstream biology. The genome was considered to be ‘the information needed to specify an organism’, and it was pretty obvious that once we knew that, then in principle we would know everything.

However, as biologists overcame the huge difficulties involved in deriving genetic sequences, and in working out the functions of genes and proteins – what they actually did in the organism – the true depths of the problem of life became ever more apparent. Listing the proteins that make up a cat does not tell us everything we want to know about cats. It doesn’t tell us everything even for more lowly creatures such as bacteria.

There is no question that a creature’s genome is fundamental to its form and behaviour, but the ‘information’ in the genome no more tells us everything about the creature than a list of components tells us how to build furniture from a flat-pack. In fact, the gulf between a living creature and its genome is far wider than that between furniture and a list of boards, screws and washers. For example, over the past few years it has also become clear that ‘epigenetic’ information, not written in DNA, and possibly not ‘coded’ in any obvious symbolic fashion, is also vital to life on Earth. Most of us who have assembled flat-packs have also required knowledge that is not included in the instructions.

Lists of ingredients are not enough to understand biology, because what really matters is how those ingredients are used – the processes that they undergo in a living creature. And the best tool we possess for finding out what processes do is mathematics. Over the past half-century or so, new mathematical discoveries have opened up a realm of rich and surprising behaviour, revealing that apparently simple processes can do astonishingly complex things. As a result, the belief that mathematics is too simple and too well behaved to provide insights into the complexity of living creatures has become very difficult to defend. Instead, attention has been focused on finding ways to exploit the power of mathematics to provide genuine insights into biology.

Some of these developments use mathematics as a tool to help with the scientific techniques that biologists use. Such applications have been around ever since physicists developed the science of optics and manufacturers used it to improve the design of microscopes. An example today is ‘bioinformatics’, the methods involved in the storage and manipulation of gigantic data sets in computers. Listing a genome is not enough: you have to be able to find what you’re looking for in the list, compare it with other items of information on other lists, and so on. When the list contains three billion items of information (and that’s just the code, let alone everything we know about what it does), that’s a non-trivial issue. Most computer technology relies on a heavy dose of hidden mathematics, and bioinformatics is no exception.

That’s worthy, useful, necessary . . . but not, in the present context, inspiring. The role of mathematics ought to be more creative. And so it is. Mathematics is being used not just to help biologists manage their data, or improve their instruments, but on a deeper level: to provide significant insights into the science itself, to help explain how life works. Over the past ten years there has been a massive growth in ‘biomathematics’ – mathematical biology. All around the globe new research institutes and centres devoted to this subject have sprung into existence, to such an extent that the people setting them up are having difficulty in finding enough qualified staff. Though still not a part of the biological mainstream, biomathematics is claiming its rightful place among the host of techniques and points of view that are necessary if we are to understand how life evolved, how it works and how organisms relate to their environment.

Ten or twenty years ago, the claim that mathematics could play a significant role in biology largely fell on deaf ears. Today, that particular battle is mostly won – as the rapid growth of specialist research centres demonstrates. It is no longer necessary to try to persuade biologists that mathematics might be useful to them. Many of them still have no wish to use it themselves, except when it has been neatly packaged into computer software, but they do not object if others do. A mathematician can be a useful addition to the research team. A few biologists still resist the importation of mathematics into their subject and would robustly deny most of what I’ve just written, but that’s fast becoming an outmoded reflex, and their influence is dwindling.

By the same token, mathematicians have learned that the only effective way to apply their subject to biology is to find out what biologists want to know, and to adapt their techniques accordingly. Biomathematics is not merely a new application for existing mathematical methods. You can’t just pull an established mathematical technique off the shelf and put it to use: it has to be tailored to fit the question. Biology requires – indeed demands – entirely new mathematical concepts and techniques, and it raises new and fascinating problems for mathematical research.

If the main driving force behind new mathematics in the twentieth century was the physical sciences, in the twenty-first century it will be the life sciences. As a mathematician, I find this prospect exciting and enticing. Mathematicians like nothing better than a rich source of new questions. Biologists, rightly, will be impressed only by the answers.