The Pedagogical Mutilation of Infinite-Dimensional Calculus - The Infinite Beauty of the Infinite Wilderness - Burn Math Class: And Reinvent Mathematics for Yourself (2016)

Burn Math Class: And Reinvent Mathematics for Yourself (2016)

Act III

The Infinite Beauty of the Infinite Wilderness

4. The Pedagogical Mutilation of Infinite-Dimensional Calculus

.4.1The Unexplained Obsession with Integral Functionals

In the conventional treatment of this subject, it is not often made clear how similar cannibal calculus is to multivariable calculus, and thus to regular old single-variable calculus. For example, in calculus of variations, specific examples of how to compute functional derivatives virtually always focus on so-called “integral functionals” — that is, on cannibalistic machines that look like this:

Some examples of “integral functionals” that we’ve already seen include the integral itself,

the arclength functional, or the length of the graph of f,

and the “norm,” or length of f when interpreted as a vector in an infinite-dimensional space. Recall from earlier that this interpretation of “length” has nothing to do with the arclength interpretation above, but rather comes from generalizing the notion of “length” for vectors to an infinite-dimensional context. Recall that this gave:

Textbooks usually write this as ||f(x)|| or ||f|| instead of Length[f(x)], but they all refer to the same thing. Now, against this background, there is a question that (I would wager) arises in the minds of most individuals on their first exposure to this subject, but which I’ve never seen addressed in a single textbook: why is cannibal calculus almost universally focused on “integral functionals,” rather than on more general functionals that aren’t necessarily written in integral form? Students are right to be confused, because the reason is actually quite subtle.

Now, it is true that most of the specific, nontrivial examples of functionals that arise in practical applications are integral functionals, but this is a different issue from why cannibal calculus pedagogy does not tend to use a broader set of examples in order to illustrate to the newcomer the extent of the parallels between cannibal calculus and multivariable calculus. First of all, notice that in all integral functionals, whatever their form, the x in f(x) appears as a “bound variable.” A few examples will serve to illustrate what I mean, and why this is relevant. Consider the integral functional

Our dictionary and the discussion above show that the analogue of this in multivariable calculus is

We’ve already seen that by making an appropriate choice of how to define the functional derivative, we can generalize expressions from multivariable calculus, like

to analogous expressions in cannibal calculus — in this case,

Now, in the textbooks, the straightforward relationship between both calculations is rarely made clear by way of simple, concrete examples, but that’s not the point of this discussion. The point is to ask: why is it so often integral functionals? That is, why do textbooks on this subject so rarely present, in their worked examples, expressions of the form

which involve functional derivatives but no integrals? However unimportant such examples might be in applications, they are quite important for pedagogy, and it is worth asking why they are so rare in textbooks on the calculus of variations. Using our dictionary, we see that the analogue of the above expression in multivariable calculus is

The index i is “free,” or unspecified, not appearing in (for example) a sum, which would make its specific value irrelevant by adding up all possible values. Because i is unspecified, our calculation of the above partial derivative has to take into account two possibilities: maybe i is the same as k, and maybe it isn’t. The “Kronecker delta” bookkeeping symbol introduced in the previous section lets us symbolically consider both possibilities at once, by writing

Expressions like equation .15 show up quite often in introductory expositions of multivariable calculus. However, the type of expression we get from translating everything in the above example into cannibal calculus language is virtually never seen in the standard textbooks. Performing the translation, we obtain

Although it may seem that there’s no real point in leaving x unspecified, it is impossible to see the above example without noticing a direct analogy between the calculus of variations and the familiar operations of single- and multivariable calculus. As such, “useless” examples like this have enormous pedagogical value that is largely lacking in the standard presentations of the subject. So, why are such examples so rarely presented in mathematics books? I would suggest that one reason stems from the following. When we compute the functional derivative of something like f(x)3 with respect to and obtain the illuminating expression then either x isn’t the same as , in which case the entire expression is zero, or else in which case the expression equals

This, I would wager, is the reason why such simple examples are usually not presented in mathematics textbooks. Whereas the corresponding example in the language of multivariable calculus gives us a clean, finite expression, the example in the calculus of variations has the “infinite” number δ(0) attached to it, and by the conventions of many mathematics textbooks, this is not considered to be a meaningful expression. While the common practice of shackling the Dirac delta function to the inside of integrals is understandable if one’s goal is to develop the most elegant, “rigorous” formalization of these concepts in the real number system, it does violence to the conceptual understanding of cannibal calculus. I’ve found in my own experience that physics graduate students tend, on average, to be much less intimidated by concrete calculations in the calculus of variations than most mathematics graduate students. Perhaps understandably, most mathematicians don’t want to allow expressions involving the Dirac delta function to show up outside of integrals, although the analogous expression involving “integral functionals” is considered kosher. To see why, remember that we can think of δ(0) as 1/dx. Because of this, if we simply throw the above expression inside an integral, then all the “infinities” disappear. Tossing the expression above inside an integral (and assuming the number is somewhere between a and b), we can write

where in the last two steps I did something completely taboo. However, imagine we simply delete the step of the argument involving δ(0), and write the same final result at the bottom. If we do that, then the resulting calculation is something that the average mathematician is much more likely to be comfortable with, as compared to the original calculation, or as compared to equation .16, which involved δ(0) because we took a functional derivative outside the safety of an integral. This, I would argue, is why you generally find books and courses on the calculus of variations so focused on “integral functionals.” As long as we focus our attention on integral functionals, all of our functional derivatives will give rise to expressions that let us avoid thinking about things like δ(0). I should stress that mathematicians are not doing anything logically incorrect in the way they present cannibal calculus in their courses and textbooks. However, in sacrificing the helpful “aha!” moment provided by simple examples like the above, I believe that the standard presentations are pedagogically incorrect in the highest degree.

Having said that, I should spend a moment arguing against myself and in favor of the standard textbooks. In many ways, the preference of our hypothetical mathematician makes perfect sense. Things like δ(0) can’t easily be defined within the comfort of the real number system, so mathematicians wanting to formalize the ideas in this chapter face a genuinely difficult choice. Either:

1.Stick to the real number system and formalize the δ function (and related objects) by saying that it’s actually a “measure” or a “distribution” or a “linear functional on a space of test functions” or some other way of not having to talk about δ(0), or. . .

2.Move past the comfort of the real number system into something like the hyperreals, in which infinitely large and infinitely small quantities are taken seriously.

If one’s goal is to develop a formal theory of the concepts in this chapter that is rigorous by the standards of mathematical culture, then the first option above is arguably the better approach. In that sense, the approach I’ve been criticizing in this chapter deserves no blame at all. If we happen to share the same goal, then the standard approach is a perfectly sensible way of achieving it.

Okay, in the above discussion, we encountered what has thus far been the central pre-mathematical theme of this book: the theme of focusing somewhat more on the thought processes by which mathematical concepts are created, as opposed to the myriad downstream consequences that such definitions may have. Different possible definitions exist for every mathematical concept, and the functional derivative is no exception. Even though any discussion must ultimately end up choosing a single definition before proposing theorems and constructing proofs, it is only by discussing the relative merits of different candidate definitions that we can finally see behind the formality of polished mathematical concepts, and understand the informal and anarchic styles of reasoning that motivated their discovery in the first place.

.4.2A Bizarre Syntactic Convention

Despite the similarity of the operations performed in equations .9–.11, most textbooks on the calculus of variations do quite a different-looking dance to compute functional derivatives. This is true even of textbooks in applied mathematics and theoretical physics, in which the standards of rigor are sufficiently different from those in pure mathematics that this odd dance may appear rather unjustified. Try staring at the following example for a minute or so, but don’t worry if it’s confusing. Here’s the dance. They’ll say something like: Consider an integral functional of the form

Then they define

At this point they’ll often say, “expanding M[f(x)+δf(x)] in powers of δf(x),” and end up with something like

where Of(x)2 stands for “stuff that depends on powers of δf(x) that are 2 or bigger.” Then they substitute the above expansion into equation .18 and ignore the so-called “higher-order terms” hiding inside the O (δf(x)2 piece to obtain

And the functional derivative is simply defined to be the quantity inside the integral, namely . Notice that the answer is exactly the same as the one we obtained above, but there are several things in the above discussion that make it appear to be quite different from the single- and multivariable calculus we’re familiar with. First of all, the fictional textbook we were imitating used something akin to the Nostalgia Device in order to expand the term M[f(x) + δf(x)]. This led to the piece M[f(x)] in the expansion canceling against the term −M[f(x)]. Then the higher-order terms were mysteriously dropped. The rationale for this is that if we’re thinking of δf(x) as an infinitely small function, analogous to the dx in single-variable calculus, then (δf(x))2 should be infinitely smaller than δf(x), thus justifying ignoring it, along with all terms with powers higher than 2. Also, notice that the discussion in which the functional derivative was defined began not by looking at anything that could reasonably be called a derivative of the functional F itself, but rather by looking at the top half of the derivative, which is to say just the δF piece. Then some stuff that happened to show up inside the integral in the course of computing δF was simply defined to be the functional derivative without giving any reason why this was done, or why this mysterious piece inside the integral deserves to be called a derivative in the first place. This weird argument did in fact arrive at the same answer as we did in the discussion above, but in a rather roundabout and confusing way.

In my own experience, I spent quite a while looking at calculus of variations from the outside, thinking “Wow! That’s complicated,” each time I saw it written in a textbook or on a chalkboard, when in reality anyone who understands basic calculus already knows 90% of what’s needed to understand the calculus of variations. It’s just that (i) differences in notation, and (ii) the different-looking dances by which functional derivatives are commonly computed in textbooks make it look like it’s a completely different topic built from wildly unfamiliar ideas.

To be sure, multiple logically equivalent formalizations of the above ideas abound in textbooks, but as we’ve discussed many times before, logical equivalence is far different from pedagogical equivalence. Much confusion could be eliminated simply by stressing ad nauseam how similar all of the scary-looking “new” stuff is to the “old” stuff with which the student is already familiar, even before the “new” stuff has officially been taught to them. At least that’s how I always felt. If you’re sick of hearing me say the same things over and over. . . good! Now try to remember this repetitive yammering when you read other textbooks, and they just might make a bit more sense.