Page images

believed the Center for Nonlinear Studies at Los Alamos could play a significant role.

Stan was associated, either directly or through inspiration, with the three research problems described in Part III of this article. Each is an example of how a probabilistic approach and computer simulation can be combined to illuminate features of nonlinear systems. Since some background in modern probability theory is needed to follow the solutions to the problems, Part II provides a tutorial on that subject, which starts with a bit of history and concludes with several profound and useful theorems. Fortunately Mark Kac and Stan Ulam gave a very insightful summary of the development of probability theory in their book Mathematics and Logic: Retrospect and Prospects. I have adapted and extended their discussion to meet the needs of this presentation but

retained their broad perspective on the history of mathematics and, in some cases, their actual words.

Other interests that Ulam maintained throughout his life were logic and set theory. I remember a conference on large cardinal numbers in New York a few years ago. Stan was the honored participant. More than fifty years earlier he had shown that if a nontrivial probability measure can be defined on all subsets of the real numbers, then the cardinal number, or "size," of the set of all the subsets exceeded the wildest dreams of the time. (See “Learning from Ulam: Measurable Cardinals, Ergodicity, and Biomathematics.") But that large cardinal of his is minuscule compared with the cardinals of today. After listening to some of the conference talks, Stan said that he felt like Woody Allen in Sleeper when he woke up after a nap of many years and was confronted with an unbelievably large number on a McDonald's hamburger sign.

There is a serious aspect to that remark. Stan felt that a split between mathematics and physics had developed during this century. One factor was the trauma that shook the foundations of mathematics when Cantor's set theory was found to lead to paradoxes. That caused mathematics to enter a very introspective phase, which continues to this day. A tremendous effort was devoted to axiomatizing mathematics and raising the level of rigor. Physics, on the other hand, experienced an outward expansion and development. (The situation is somewhat reversed today, as internal issues concerning the foundations of physics receive attention.) As a result, university instruction of mathematicians has become so rigorous and demanding that the mathematical training of scientists has been taken over by other departments. Consequently, instruction in "applied" mathematics, or mathematical methods, is often at a fairly low level of rigor, and, even worse, some of the important mathematical techniques developed during this century have not made their way into the bag of tools of many physical scientists. Stan was very interested in remedying the situation and

J. VON NEUMANN PRIZE: A bottle of whiskey of measure > 0. July 4, 1937 Original manuscript in German

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small]

Figure 19.1. Two
possible solutions. The
line segment rotates
within the curve, and
in each position cuts
off half the area and
half the perimeter.

PRIZE: One bottle of wine, S. ULAM


Definition of a certain game. Given is a set E of real numbers. A game between
two players A and B is defined as follows: A selects an arbitrary interval ; B
then selects an arbitrary segment (interval) d2 contained in dı; then A in his turn se-
lects an arbitrary segment d3 contained in d2 and so on. A wins if the intersection
dj, d2, ..., dn ... contains a point of the set E; otherwise, he loses. If E is a comple-
ment of a set of first category, there exists a method through which A can win; if E
is a set of first category, there exists a method through which В will win.
Problem. It is true that there exists a method of winning for the player A only for
those sets E whose complement is, in a certain interval, of first category; similarly,
does a method of win exist for B if E is a set of first category?
Addendum. Mazur's conjecture is true.

S. Banach, August 4, 1935

Modifications of Mazur's Game

(1) There is given a set of real numbers E. Players A and B give in turn the digits
0 or 1. E wins if the number formed by these digits in a given order (in the binary
system) belongs to E. For which E does there exist a method of win for player A
(player B)?

(2) There is given a set of real numbers E. The two players A and B in turn give
real numbers which are positive and such that a player always gives a number
smaller than the last one given. Player A wins if the sum of the given series of num-
bers is an element of the set E. The same question as for (1).


Commentary. The first published paper on general finite games with perfect information is Zermelo's .... Here, in Problem 43, we have the first interesting definition


on PROBABILITY, MEASURE, and the laws of


Part II



s mentioned in the introduction, Stan Ulam contributed to the measure-theoretic foundations that allow one to define a probability when the number of possible

outcomes is infinite rather than finite. Here I will explain why this extension is so necessary and so powerful and then use it to introduce the laws of large numbers. Those laws are used routinely in probability and its applications (several times, for example, during solution of the problems discussed in Part III). Following the logic of Kac and Ulam I begin at the beginning.*


Early Probability Theory

Probability theory has its logical and historical origins in simple problems of counting. Consider games of chance, such as tossing a coin, rolling a die, or drawing a card from a well-shuffled deck. No specific outcome is predictable with certainty, but all possible outcomes can usually be listed or described. In many instances the number of possible outcomes is finite (though perhaps exceedingly large). Suppose we are interested in some subset of the outcomes (say, drawing an ace from a deck of cards) and wish to assign a number to the likelihood that a given outcome belongs to that subset. Our intuitive notion of probability suggests that that number should equal the ratio of the number of outcomes yielding the event (4, in the case of drawing an ace) to the number of all possible events (52, for a full deck of cards).

This is exactly the notion that Laplace used to formalize the definition of probability in the early nineteenth century. Let A be a subset of the set 12 of all possible outcomes, and let P (A) be the probability that a given outcome is in A. For situations such that N is a finite set and all outcomes in S are equally probable, Laplace defined P(A) as the ratio of the number v(A) of elements in A to the total number v(s) of elements of S2; that is,

P(A) =


However, the second condition makes the definition circular, for the concept of probability then is dependent upon the concept of equiprobability. As will be described later, the more modern definition of probability does not have this difficulty.

For now let us illustrate how Laplace's definition reduces the calculation of probabilities to counting. Suppose we toss a fair coin (one for which heads and tails

*The material quoted in this tutorial from Mathematics and Logic has been reprinted with permission from Encyclopedia Britannica, Inc.

[merged small][subsumed][merged small][merged small][ocr errors][ocr errors][merged small]

are equally probable) n times and want to know the probability that we will obtain exactly m heads, where 1 sm En. Each outcome of n tosses can be represented as a sequence, of length n, of H’s and T's (HTHH ...THH, for example), where H stands for heads and T for tails. The set 12 of all possible outcomes of n tosses is then the set of all possible sequences of length n containing only H's and T's. The total number of such sequences, v(12), is 2". How many of these contain H exactly m times? This is a relatively simple problem in counting. The first H can occur inn positions, the second in n - 1 positions, ..., and the mth in (n - m + 1) positions. So if the H's were an ordered sample (H,H2, ...,Hm), the number of sequences with m H’s would equal n(n 1)(n − 2)... (n m + 1). But since all the H’s are the same, we have overcounted by a factor of m! (the number of ways of ordering the H’s). So the number of sequences of length n containing m H's is

[ocr errors][merged small][merged small]


[merged small][ocr errors]
[blocks in formation]
[blocks in formation]

(The number n!/m!(n m)!, often written (m), is the familiar binomial coefficient, that is, the coefficient of x" yn -m in the expansion of (x + y)"). Since the number of sequences with exactly m H’s is (m) and the total number of sequences is 2", we have by Laplace's definition that the probability P(m,n) of obtaining m heads in n tosses of a fair coin is


1 P(m,n)

m!(n m)! 21

[merged small][ocr errors][merged small]

Consider now a coin that is “loaded" so that the probability of a head in a single toss is 1/6 (and the probability of a tail in a single toss is 5/6). Suppose again we toss this coin n times and ask for the probability of obtaining exactly m heads. To describe the equiprobable outcomes in this case, one can resort to the artifice of thinking of the coin as a six-faced die with an H on one face and T's on all the others. Using this artifice to do the counting, one finds that the probability of m heads in n tosses of the loaded coin is

[blocks in formation]

Fig. 1. Almost two centuries ago Laplace showed that the number Ny of heads obtained in a large number n of tosses of a coin (fair or loaded) follows the standard normal distribution of probabilities. More precisely, he showed that the probability of Ny being equal to or less than np + IV np(1 - 0) (where p is the probability of a head in a single toss and t is some number) can be approximated, for large n, by the standard normal distribution function F(t) shown in (a). The derivative of a distribution function (when it exists) is called a frequency, or density, function. Shown in (b) is the density function f(t) for the standard normal distribution function. Note that the value of the distribution fuction at some particular value of t, say B, is equal to the area under the density function from - to B.

Suppose further that the coin is loaded to make the probability of H irrational (V2/2, for example). In such a case one is forced into considering a many-faced die and passing to an appropriate limit as the number of faces becomes infinitely large. Despite this awkwardness the general result is quite simple: If the probability of a head in one toss is p, 0 <p < 1, and the probability of a tail is 1 - p = 9, then the probability of m heads in n tosses is

[merged small][ocr errors][merged small][merged small][merged small][merged small]
[merged small][merged small][merged small][ocr errors][merged small]

What is the probability P that a randomly chosen chord of a circle is longer than the side of the equilateral triangle inscribed within the circle?

the function dF /dt = (1/V21 )e-1/2 is called the standard normal density function.)

The de Moivre-Laplace result can be stated as follows. As n gets larger and larger, the probability that Ny, the number of heads tossed, will be less than or equal to np + pqn (where t is some number) is approximated better and better by the standard normal distribution function. Symbolically,

This question cannot be answered by using Laplace's definition of probability, since the set of all possible chords is infinite, as is the set of desired chords (those longer than the side of the inscribed equilateral triangle). However, the question might be approached in the two ways depicted here and described in the text. Although both approaches seem reasonable, each leads to a different answer!

[merged small][ocr errors]


P = 1/3

In other words, P(NH < np +tpq) is approximated by the area under the standard normal density function from - to 1, as shown in Fig. 1. (In modern terminology Nh is called a random variable; this term and the terms distribution function and density function will be defined in general later.)

The de Moivre-Laplace theorem was originally thought to be just a special property of binomial coefficients. However, many chance phenomena were found empirically to follow the normal distribution function, and it thus assumed an aura of universality, at least in the realm of independent trials and events. The extent to which the normal distribution is universal was determined during the 1920s and 1930s by Lindeberg, Feller, and others after the measure-theoretic foundations of probability had been laid. Today the de Moivre-Laplace theorem (which applies to independent trials, each governed by the same probabilities) and its extension to Poisson schemes (in which each independent trial is governed by different probabilities) are regarded simply as special cases of the very general central limit theorem. Nevertheless they were the seeds from which most of modern probability theory grew.

[blocks in formation]

The awkwardness and logical inadequacy of Laplace's definition of probability made mathematicians suspicious of the whole subject. To make matters worse, attempts to extend Laplace's definition to situations in which the number of possible outcomes is infinite resulted in seemingly even greater difficulties. That was dramatized by Bertrand, who considered the problem of finding the probability that a chord of a circle chosen "at random” be longer than the side of an equilateral triangle inscribed in the circle.

If we fix one end of the chord at a vertex of the equilateral triangle (Fig. 2a), we can think of the circumference of the circle as being the set N of all possible outcomes and the arc between the other two vertices as the set A of “favorable outcomes" (that is, those resulting in chords longer than the side of the triangle). It thus seems proper to take 1/3, the ratio of the length of the arc to the length of the circumference, as the desired probability.

On the other hand we can think of the chord as determined by its midpoint and thus consider the interior of the circle as being the set 12 of all possible outcomes. The set A of favorable outcomes is now the shaded circle in Fig. 2b, whose radius is onehalf that of the original. It now seems equally proper to take 1/4 for our probability,

Fig. 2.

Los Alamos Science Special Issue 1987

« PreviousContinue »