Los Alamos Science

Other interests that Ulam maintained throughout his life were logic and set theory. I remember a conference on large cardinal numbers in New York a few years ago. Stan was the honored participant. More than fifty years earlier he had shown that if a nontrivial probability measure can be defined on all subsets of the real numbers, then the cardinal number, or "size," of the set of all the subsets exceeded the wildest dreams of the time. (See "Learning from Ulam: Measurable Cardinals, Ergodicity, and Biomathematics.") But that large cardinal of his is minuscule compared with the cardinals of today. After listening to some of the conference talks, Stan said that he felt like Woody Allen in Sleeper when he woke up after a nap of many years and was confronted with an unbelievably large number on a McDonald's hamburger sign.

There is a serious aspect to that remark. Stan felt that a split between mathematics and physics had developed during this century. One factor was the trauma that shook the foundations of mathematics when Cantor's set theory was found to lead to paradoxes. That caused mathematics to enter a very introspective phase, which continues to this day. A tremendous effort was devoted to axiomatizing mathematics and raising the level of rigor. Physics, on the other hand, experienced an outward expansion and development. (The situation is somewhat reversed today, as internal issues concerning the foundations of physics receive attention.) As a result, university instruction of mathematicians has become so rigorous and demanding that the mathematical training of scientists has been taken over by other departments. Consequently, instruction in "applied" mathematics, or mathematical methods, is often at a fairly low level of rigor, and, even worse, some of the important mathematical techniques developed during this century have not made their way into the bag of tools of many physical scientists. Stan was very interested in remedying the situation and

believed the Center for Nonlinear Studies at Los Alamos could play a significant role.

Stan was associated, either directly or through inspiration, with the three research problems described in Part III of this article. Each is an example of how a probabilistic approach and computer simulation can be combined to illuminate features of nonlinear systems. Since some background in modern probability theory is needed to follow the solutions to the problems, Part II provides a tutorial on that subject, which starts with a bit of history and concludes with several profound and useful theorems. Fortunately Mark Kac and Stan Ulam gave a very insightful summary of the development of probability theory in their book Mathematics and Logic: Retrospect and Prospects. I have adapted and extended their discussion to meet the needs of this presentation but have retained their broad perspective on the history of mathematics and, in some cases, their actual words.

Excerpts from the
SCOTTISH
BOOK

These excerpts from The Scottish Book: Mathematics from the Scottish Café are reprinted with permission of Birkhäuser Boston. That 1981 edition of problems from "the book" kept at the Scottish Café was edited by R. Daniel Mauldin. The two earlier English-language editions of the problems were edited by Stan Ulam, the first being a 1957 mimeographed version of Ulam's own translation into English from the languages originally inscribed in "the book."

Problems 18 and 19 are still unsolved, and the work stimulated by Problem 43 has played a major role in understanding the consequences of the axiom of choice.

163

J. VON NEUMANN

PRIZE: A bottle of
whiskey of measure > 0

July 4, 1937

Original manuscript

in German

within the curve, and

in each position cuts

off half the area and
half the perimeter.

Definition of a certain game. Given is a set E of real numbers. A game between
two players A and B is defined as follows: A selects an arbitrary interval d1; B
then selects an arbitrary segment (interval) d2 contained in d1; then A in his turn se-
lects an arbitrary segment d3 contained in d2 and so on. A wins if the intersection
d1, d2,..., dn ... contains a point of the set E; otherwise, he loses. If E is a comple-
ment of a set of first category, there exists a method through which A can win; if E
is a set of first category, there exists a method through which B will win.
Problem. It is true that there exists a method of winning for the player A only for
those sets E whose complement is, in a certain interval, of first category; similarly,
does a method of win exist for B if E is a set of first category?
Addendum. Mazur's conjecture is true.

Modifications of Mazur's Game

S. Banach, August 4, 1935

(1) There is given a set of real numbers E. Players A and B give in turn the digits
0 or 1. E wins if the number formed by these digits in a given order (in the binary
system) belongs to E. For which E does there exist a method of win for player A
(player B)?
Ulam
(2) There is given a set of real numbers E. The two players A and B in turn give
real numbers which are positive and such that a player always gives a number
smaller than the last one given. Player A wins if the sum of the given series of num-
bers is an element of the set E. The same question as for (1).
Banach

Commentary. The first published paper on general finite games with perfect infor-
mation is Zermelo's .... Here, in Problem 43, we have the first interesting definition

Part II

PROBABILITY and NONLINEAR SYSTEMS

A TUTORIAL
on PROBABILITY,

MEASURE, and the laws of
LARGE NUMBERS

s mentioned in the introduction, Stan Ulam contributed to the measure-theoretic foundations that allow one to define a probability when the number of possible outcomes is infinite rather than finite. Here I will explain why this extension is so necessary and so powerful and then use it to introduce the laws of large numbers. Those laws are used routinely in probability and its applications (several times, for example, during solution of the problems discussed in Part III). Following the logic of Kac and Ulam I begin at the beginning.*

Early Probability Theory

Probability theory has its logical and historical origins in simple problems of counting. Consider games of chance, such as tossing a coin, rolling a die, or drawing a card from a well-shuffled deck. No specific outcome is predictable with certainty, but all possible outcomes can usually be listed or described. In many instances the number of possible outcomes is finite (though perhaps exceedingly large). Suppose we are interested in some subset of the outcomes (say, drawing an ace from a deck of cards) and wish to assign a number to the likelihood that a given outcome belongs to that subset. Our intuitive notion of probability suggests that that number should equal the ratio of the number of outcomes yielding the event (4, in the case of drawing an ace) to the number of all possible events (52, for a full deck of cards).

This is exactly the notion that Laplace used to formalize the definition of probability in the early nineteenth century. Let A be a subset of the set of all possible outcomes, and let P(A) be the probability that a given outcome is in A. For situations such that n is a finite set and all outcomes in n are equally probable, Laplace defined P(A) as the ratio of the number (A) of elements in A to the total number v() of elements of N; that is,

V(A)
P(A) =
v(N)

*The material quoted in this tutorial from Mathe

matics and Logic has been reprinted with permission from Encyclopedia Britannica, Inc.

However, the second condition makes the definition circular, for the concept of probability then is dependent upon the concept of equiprobability. As will be described later, the more modern definition of probability does not have this difficulty.

For now let us illustrate how Laplace's definition reduces the calculation of probabilities to counting. Suppose we toss a fair coin (one for which heads and tails.

are equally probable) n times and want to know the probability that we will obtain
exactly m heads, where 1 ≤m ≤ n. Each outcome of n tosses can be represented
as a sequence, of length n, of H's and T's (HTHH .....THH, for example), where H
stands for heads and T for tails. The set of all possible outcomes of n tosses is
then the set of all possible sequences of length n containing only H's and T's. The
total number of such sequences, v(N), is 2′′. How many of these contain H exactly
m times? This is a relatively simple problem in counting. The first H can occur in n
positions, the second in n 1 positions, and the mth in (n − m + 1) positions. So
if the H's were an ordered sample (H1, H2,...,Hm), the number of sequences with m
H's would equal n(n
- 2)... (n m + 1). But since all the H's are the same,
we have overcounted by a factor of m! (the number of ways of ordering the H's). So
the number of sequences of length n containing m H's is

- 1)(n

n!
m!(n - m)!'

(The number n!/m!(n m)!, often written (), is the familiar binomial coefficient, that is, the coefficient of xmy"-m in the expansion of (x + y)"). Since the number of sequences with exactly m H's is (") and the total number of sequences is 2", we have by Laplace's definition that the probability P(m, n) of obtaining m heads in n tosses of a fair coin is

Consider now a coin that is "loaded" so that the probability of a head in a single toss is 1/6 (and the probability of a tail in a single toss is 5/6). Suppose again we toss this coin n times and ask for the probability of obtaining exactly m heads. To describe the equiprobable outcomes in this case, one can resort to the artifice of thinking of the coin as a six-faced die with an H on one face and T's on all the others. Using this artifice to do the counting, one finds that the probability of m heads in n tosses of the loaded coin is

Suppose further that the coin is loaded to make the probability of H irrational (√2/2, for example). In such a case one is forced into considering a many-faced die and passing to an appropriate limit as the number of faces becomes infinitely large. Despite this awkwardness the general result is quite simple: If the probability of a head in one toss is p, 0 ≤ p ≤ 1, and the probability of a tail is 1 Pq, then the probability of m heads in n tosses is

[merged small][merged small][merged small][ocr errors][merged small][ocr errors][merged small][merged small][merged small][merged small]

dF(t) dt

Fig. 1. Almost two centuries ago Laplace showed that the number NH of heads obtained in a large number n of tosses of a coin (fair or loaded) follows the standard normal distribution of probabilities. More precisely, he showed that the probability of NH being equal to or less than np + t√np(1 − p) (where p is the probability of a head in a single toss and t is some number) can be approximated, for large n, by the standard normal distribution function F(t) shown in (a). The derivative of a distribution function (when it exists) is called a frequency, or density, function. Shown in (b) is the density function f(t) for the standard normal distribution function. Note that the value of the distribution fuction at some particular value of t, say ß, is equal to the area under the density function from - ∞ to 3.

Building on earlier work of de Moivre, Laplace went further to consider what happens as the number of tosses gets larger and larger. Their result, that the number of heads tossed obeys the so-called standard normal distribution of probabilities, was a major triumph of early probability theory. (The standard normal distribution function,

BERTRAND'S PARADOX

What is the probability P that a randomly chosen chord of a circle is longer than the side of the equilateral triangle inscribed within the circle?

This question cannot be answered by using Laplace's definition of probability, since the set of all possible chords is infinite, as is the set of desired chords (those longer than the side of the inscribed equilateral triangle). However, the question might be approached in the two ways depicted here and described in the text. Although both approaches seem reasonable, each leads to a different answer!

(a)

P = 1/3

(b)

P = 1/4

Fig. 2.

the function dF /dt = (1/√) e-1/2 is called the standard normal density function.) The de Moivre-Laplace result can be stated as follows. As n gets larger and larger, the probability that NH, the number of heads tossed, will be less than or equal to np+1√√pqn (where t is some number) is approximated better and better by the standard normal distribution function. Symbolically,

In other words, P(NH ≤ np + tnpq) is approximated by the area under the standard normal density function from -∞ to t, as shown in Fig. 1. (In modern terminology NH is called a random variable; this term and the terms distribution function and density function will be defined in general later.)

The de Moivre-Laplace theorem was originally thought to be just a special property of binomial coefficients. However, many chance phenomena were found empirically to follow the normal distribution function, and it thus assumed an aura of universality, at least in the realm of independent trials and events. The extent to which the normal distribution is universal was determined during the 1920s and 1930s by Lindeberg, Feller, and others after the measure-theoretic foundations of probability had been laid. Today the de Moivre-Laplace theorem (which applies to independent trials, each governed by the same probabilities) and its extension to Poisson schemes (in which each independent trial is governed by different probabilities) are regarded simply as special cases of the very general central limit theorem. Nevertheless they were the seeds from which most of modern probability theory grew.

Bertrand's Paradox

The awkwardness and logical inadequacy of Laplace's definition of probability made mathematicians suspicious of the whole subject. To make matters worse, attempts to extend Laplace's definition to situations in which the number of possible outcomes is infinite resulted in seemingly even greater difficulties. That was dramatized by Bertrand, who considered the problem of finding the probability that a chord of a circle chosen "at random" be longer than the side of an equilateral triangle inscribed in the circle.

If we fix one end of the chord at a vertex of the equilateral triangle (Fig. 2a), we can think of the circumference of the circle as being the set N of all possible outcomes and the arc between the other two vertices as the set A of "favorable outcomes" (that is, those resulting in chords longer than the side of the triangle). It thus seems proper to take 1/3, the ratio of the length of the arc to the length of the circumference, as the desired probability.

On the other hand we can think of the chord as determined by its midpoint and thus consider the interior of the circle as being the set N of all possible outcomes. The set A of favorable outcomes is now the shaded circle in Fig. 2b, whose radius is onehalf that of the original. It now seems equally proper to take 1/4 for our probability,

« Previous Continue »

Books