Inventing Cannibal Calculus - The Infinite Beauty of the Infinite Wilderness - Burn Math Class: And Reinvent Mathematics for Yourself (2016)

Burn Math Class: And Reinvent Mathematics for Yourself (2016)

Act III

The Infinite Beauty of the Infinite Wilderness

3. Inventing Cannibal Calculus

.3.1Building New from Old, as Always

It is not the essence of our objects but their behavior that concerns us most. This is true in all of mathematics, and we’ve seen this principle since the beginning. There may indeed be many possible definitions of area or slope that behave differently than the ones we have used. Such definitions may be infinitely more complicated than ours, and thus much more difficult to work with. However, our definitions behave exactly like we want them to, because we forced them to, and thus we can confine our attention to them. The Void contains an infinite swarm of machines, most of which are much more complicated than the four species we examined in Chapter 5, but we chose to examine those four particular kinds of machines because we knew exactly what we could say about them. And we knew what we could say about them precisely because we defined them by how they behaved.

Here we face the same principle, though in a different guise. Ignoring the baggage of existing textbooks, how can we define the derivative of machines that eat entire machines and spit out numbers in a way that allows us to perform the same operations on them that we always have? Rather than abiding by the practice customary in textbooks, in which one first defines the set of functions that one is talking about (as if that were the primary concern driving mathematical creation) and only then proceeds to derive results, we prefer the opposite path. We will instead pre-mathematicallyderive the definitions of infinite-dimensional calculus by remaining agnostic about the exact space of functions we’re talking about, and forge ahead by forcing these new objects — whatever they might be — to behave as similarly to the calculus we’re familiar with as we require them to in order to move forward. Perhaps most importantly, whatever “derivative” means in this new world, we will demand that derivatives can still be thought of as one infinitely small number divided by another. Only in that way can we carry over the calculus expertise we’ve developed thus far in the book into this untamed wilderness. Investigation of exactly which objects obey such operations and how to code such objects into the language of set theory is an optional afterthought, to be carried out by anyone who is interested in the task. When the dust settles, if it turns out that we were studying a different class of objects than we thought we were studying originally, so be it. We are unconcerned with essences.

So! Suppose we’ve got a cannibalistic machine, by which we mean a machine that eats an entire machine f(x) and spits out a number F[f(x)]. In multivariable calculus, we defined partial derivatives by starting with a machine F(x), then making an infinitely small change to one of x’s slots, while leaving all the rest the same. Then we looked at the difference in output before and after, and divided this by the difference in input before and after. In the spirit of building new from old, let’s do exactly the same thing to define the derivative in the infinite wilderness.

Given a cannibalistic machine F, we have infinitely many “slots” in which we can make changes. For machines that eat vectors, these slots were labeled x1, x2, . . ., xn. Now, for machines that eat entire machines, these slots are labeled f(0), f(0.001), f(3), f(796.5) and so on. We can’t actually list every slot, but there is one for each number x. While x3 labeled the number in the third slot of a vector x, the symbol f(x) now labels the number in the xth slot of the machine f. As such, let’s start by defining the “partial derivative” of a cannibal machine.

First, let’s abbreviate:

Before explaining what this means, it’s important to stress that we’re using the weird notation δ instead of d not in order to confuse you, but so that you can see how simple and similar to single-variable calculus are the ideas expressed by the intimidating notation of the standard textbooks. If the δs are scaring you, then please, for the love of mathematics, cross out my equations and rewrite them with d’s instead. The equations would convey exactly the same content either way.

Okay. In the above equation, the symbol δf(x) refers to an infinitely small machine, just as dx in single-variable calculus referred to an infinitely small number. The sense in which this machine is “infinitely small” is provided by our definition (earlier in this chapter) of a machine’s “length” or “size,” which was itself inspired by the formula for shortcut distances. If a machine’s length by that definition is an infinitely small number, then it is an “infinitely small machine.” Crucially, as in single-variable calculus, the f in δf(x) is not the same as the f in f(x).3

This point deserves clarification: In single-variable calculus, the notation x + dx refers to x plus an unrelated infinitely small number. When we write “x + dx” we write four symbols: (i) x, (ii) +, (iii) d, (iv)x. As we already know, the x in item (i) is in no way related to the x in item (iv), which forms the second letter of “dx.” The notation dx in single-variable calculus is not something we do to the number x; it is simply an (unrelated) infinitely small number. Even though we already know this, it is important to emphasize in understanding the abbreviation in equation .3. Confusingly, however, in many areas of mathematics, the d in d(thing) does refer to an action performed on the (thing) that follows it, and calculus of variations textbooks often use the δ in that way as well, to refer to something they call a “variation.” Don’t worry about that for now. We’ll encounter the idea again soon.

One point about notation: what we’ve written as F[f(x)] could more properly be written as F[f], because it does not depend on some particular value of x, but rather on the entire machine f. However, in my experience, writing F[f(x)] will tend to be less confusing in the long run. Because of this, when we need to specify which particular slot we’re differentiating with respect to, we need another letter besides x. Rather than use something like y, which might connote “verticalness,” I’ll just use . The squiggle above the x just says “this is a different symbol from x, and it may or may not refer to a different point.” That’s all. Alright, now that we’ve done this, our dictionary suggests that we define the derivative of F with respect to the particular slot to be

where δF[f(x)] is defined as in equation .3, and is simply the output of some unrelated infinitely small machine δf (whose particular form we’re remaining agnostic about, as we have done with variables from the beginning) when fed the input . Let’s see how this all works in practice. Suppose we’re looking at the particular cannibalistic machine

Then, using the definitions above,

So simply dividing by , we obtain

.3.2Infinite Pre-mathematics, Part 1: A Possibility They Never Discuss

At this point, it might appear that the discussion is stuck, because we haven’t yet specified what the symbol

refers to. But remember, we’re inventing this stuff ourselves, so rather than asking, “What should we do next?” we should instead ask, “What do we want the derivative of F[f(x)] to be?” If that sounds like a backward way of reasoning, think again! Remember what we’ve been doing throughout the book. The process of generalizing an old, familiar concept to wilder and weirder contexts always involves a choice. The choice is: which aspects of the old concept do we want to build into our new, more general version of it? Here, as we’ll see in a moment, the choice is essentially whether we want this derivative

to be equal to or to . We’ll see why in a few lines. Back to the point: we stopped in the above calculation because we had not yet said what we mean by

So the question of how we want to define the “functional derivative” — the derivative of a cannibalistic machine F[f(x)] — cannot proceed unless we say how we want to define the functional derivatives of the different “vector slots” f(x) and with respect to each other. Maybe our dictionary will help. Recall that in multivariable calculus, we had

which just says that since the different variables x1, x2, . . ., xn are thought of as “perpendicular directions,” we can of course change our position along one without changing our position along another, for the same reason we can walk east or west without changing our position along the north-south axis. Since we’re free to generalize in whatever way we want, we could choose to define

If we were to make that choice, then we could pick up where we left off, and the above functional derivative would become

Notice that each piece has a dx attached, so each piece is infinitely small. However, the second piece is two infinitely small pieces multiplied together, and it is therefore infinitely smaller than the first piece. As such, with this definition, we could simply say

This says that the choice expressed by equation .5 leads us to a situation in which all functional derivatives are infinitely small. Intuitively, why is this? In multivariable calculus, making the choice in equation .4, namely

leads partial derivatives to typically be normal run-of-the-mill numbers, not infinitely small or infinitely large numbers. For example, in multivariable calculus,

Notice that there are no infinitely small numbers like dx attached to this after we’re done taking the derivative. Why, then, should we have obtained the result

when we made the choice expressed by equation .5? The two equations are so clearly analogous that it might not be clear at first why one ended up being infinitely small while the other ended up being a normal number. Well, again ignoring the cautions of the standard textbooks, we can find a straightforward answer. The sum in equation .6 was a sum of finitely large things, and thus it is not surprising that we obtained a finite number for the derivative. However, the integral in equation .7 is a sum of infinitely small things. That is, it is a sum of the areas of infinitely thin rectangles, each of which looks like f(number)dx, where f(number) is a normal number like 3 or 7 or 52, and where dx is an “infinitely small” number. As such, defining the “partial functional derivatives” as we did in equation .5 — having done so because we wanted them to be defined as similarly to equation .4 as possible — is ultimately what led our functional derivatives to be infinitely small numbers.

Intuitively, this makes sense. How much does the area under a machine’s graph change if we change the height at a single point x by an infinitely small amount? Whatever the answer, it should have two infinitely small numbers attached: the infinitely small width of the original rectangle (fromdx), and the infinitely small change to its height (from δf(x)). As such, if we chose to define the “partial functional derivatives” as we did in equation .5, it makes sense that the rate of change of the entire area should have one infinitely small number attached, since one of the two infinitely small pieces will get canceled when we divide by in computing the derivative.

.3.3Infinite Pre-mathematics, Part 2: A Sexier Definition

What other definition might we use as a replacement for equation .5 if we want our functional derivatives to be normal, finitely large numbers? Well, to answer this, we have to take the discussion back to before equation .5, where we defined . Suppose we want to define functional derivatives in whatever way we have to in order to ensure that our simple cannibalistic machine ends up having its derivative look like , rather than . What would we have to do? Well, our previous choice left us with an unwanted dx, so the dumbest (or if you prefer, simplest) possible thing we could do to get finite numbers for functional derivatives would be to use this definition instead:

Let’s see where this choice gets us, if anywhere. Picking up where we left off, we have

Just as before, the term is infinitely smaller than the term, so we can simply write

Perfect! This is exactly the nice, finite answer we wanted. Perhaps not unexpectedly, we noticed that defining the quantities to be infinitely large ended up canceling out the effects of the fact that the integral itself was a sum of a bunch of infinitely small numbers.

.3.4Adding Two More δs to the d → ∂ → δ Travesty

When all the dust settled, we had figured out which definition of the functional derivative would give us a nice finite answer. Having considered the consequences of different possible definitions, we now have a much clearer understanding of the relationship between single-variable, multivariable, and infinite-dimensional calculus, and why the last of these looks the way it does. For example, if we choose the definition in equation .8, then it isn’t hard to see the analogy between the following equations, even with the strange notation change from d to to δ, and from nothing to Σ to ∫:

The same similarity holds if we don’t sum all the variables but just differentiate the square of an unspecified one. We’ll see this in a moment, but first it helps to discuss two interlocking pieces of the conventional notation. Earlier, we defined to be 1 if the indices were the same, and 0 if they were different. Textbooks call this the “Kronecker delta,” which sounds very fancy, and they write

Although the phrase “Kronecker delta” might sound like some sort of island prison where only the most dangerous criminals are kept, it’s a really simple idea. Notice that, in a wonderful fit of confusing notation, the symbol δij has nothing to do with the δ that has for some reason replaced our dand in the world of infinite-dimensional calculus.

Similarly, recall that we defined to be when x and were the same, and 0 otherwise. Textbooks, you won’t be surprised to hear, have a goofy name for this as well. They call it the “Dirac delta function,” which isn’t the best terminology, but we’ll tolerate it anyway because it’s named after an extremely strange and brilliant guy. Although the textbooks virtually never write it like this, the Dirac delta function is defined by:

We can therefore think of this function as being zero almost everywhere, except at x = 0, where it can be thought of as an “infinitely tall spike.” We can write the above definition in a form that may look slightly more complicated at first, but which makes the analogy with the Kronecker delta more clear, like this:

Now for the payoff. Notice that this is exactly analogous to the Kronecker delta, in that (i) both δ symbols have two “variables” in their description, (ii) when these two variables are not equal, both δ symbols equal zero, and (iii) when the two variables are equal, the delta symbol equals whatever it has to in order to make multivariable calculus and the calculus of variations behave in the same way. In case that last sentence was unclear, let’s illustrate it with an example.

In equations .9, .10, and .11, we demonstrated the similarity of the three forms of calculus we’ve examined: single-variable, multivariable, and cannibal (or “variational” or “functional” or whatever you want to call it). Using these new δ symbols, we can demonstrate this similarity in another way, without summing all of entries. Making an argument almost exactly like the one we used to invent our hammer for reabbreviation (the “chain rule”) in Chapter 3, we obtain

and

Therefore, using these two new versions of the δ symbol (Kronecker and Dirac, (on the right of the above two equations) which have nothing to do with the δ symbol in functional derivatives (on the left of the above equations). . . see why I always complain about the standard notation? It makes me write sentences like this!), we can demonstrate the similarity between our three kinds of calculus in yet another way, as follows:

See how similar? Of course, for a complicated set of historical and cultural reasons, mathematicians (and the mathematics books they write) virtually never teach calculus of variations this way. We’ll briefly discuss some of the more irksome aspects of the way the subject is formalized and taught in the next section.