Los Alamos Science

its distribution function F. This function, which contains all the information we need to know about a random variable, is defined as follows:

F (t) = P(X ≤t),

where the set X ≤t is the set of all points w in N such that X (w) ≤ t. The form of this function is particularly convenient. It allows us to rewrite E(X), which is a Lebesgue integral over an abstract space, as a familiar classical integral over the real line:

[ocr errors][merged small][subsumed][merged small][merged small][merged small][ocr errors][merged small][merged small]

The expected value is one of the two commonly occurring averages in probability and statistics; the other is the variance of X, denoted by o2(X) or var(X). The variance is defined as the expected value of the square of the deviation of X from its mean:

The standard, or root-mean-square, deviation of X is defined as σ(X) = √√var(X). Figures 3 and 4 illustrate two distribution functions, the binomial distribution function for the number of heads obtained in five tosses of a fair coin and a normal distribution function with a positive mean.

The Laws of Large Numbers

A historically important problem in probability theory and statistics asks for estimates on how a random variable deviates from its mean, or expected, value. A simple rough estimate is, of course, its root-mean-square deviation. An estimate of a different nature was obtained by the nineteenth-century mathematician Chebyshev. This estimate, known as Chebyshev's inequality, gives an upper limit on the probability that a random variable Y deviates from its mean E(Y) by an amount equal to or greater than a (a > 0):

Chebyshev's inequality: P(|Y – E(Y)| ≥ a) ≤ var(Y)/a2.

This fundamental inequality will lead us to the famous laws of large numbers, which tell us about average values for infinite sequences of random variables. We begin by returning again to the coin loaded in such a way that p is the probability of a head in a single toss. If this coin is tossed a large number of times n, shouldn't the frequency of heads, NH /n, be approximately equal to p, at least in some sense?

This question can be answered on several levels. Let X, be the random variable describing the outcome of the ith toss. Set X; 1 if the ith toss is a head and X; = 0 if

BINOMIAL DISTRIBUTION FUNCTION

Fig. 3. The distribution function F(t) for the number of heads obtained in n independent tosses of a fair coin is a binomial distribution, so called because the probability of obtaining k heads in n tosses of the coin is given by a formula involving binomial coefficients, namely (). Shown here is the

binomial distribution function for the number of heads obtained in five tosses of the coin. The value of F(t) equals the probability that the number of heads is equal to or less than t.

[merged small][ocr errors][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small]

Fig. 4. So many random variables can be described, at least approximately, by the distribution function shown in (a) that it is known as the normal distribution function. Examples of such random variables include the number of heads obtained in very many tosses of a coin and, as a general experimental fact, accidental errors of observation. The value of F(t) equals the probability that the value of the random variable is equal to or less than (t − m)/ơ, where m is the mean, or expected, value of the random variable and σ is its standard deviation. (The mean here is assumed to be positive.) Shown in (b) is the normal density function f(t) = dF(t)/dt, which gives the probability that the value of the random variable is (t - m)/σ.

(Random variables that have the same distribution function are said to be identically distributed.) Now the expected value of NH /n is easy to compute:

E (Nμ /n) = E ((X] + ... + X,)/n) = (1/n) ΣE (X;) = (1/n)(np) = p.

i=1

Thus, on the simplest level our guess is right: The frequency of heads, NH /n, is approximately equal to p in the sense that the expected value of NÅ /n is p. But surely, even in a very long series of tosses, it would be foolish to expect NH /n to exactly equal p (and NT/n to exactly equal 1 p). What one is looking for is a statement that holds only in the limit as the number of tosses becomes infinite.

Bernoulli proved such a theorem: As n gets larger and larger, the probability that NH/n differs from its expected value p by more than a positive amount tends to 0:

where P is the probability measure on N,, the space of all sequences of H's and T's of length n. No matter what positive is chosen, the probability that the difference between the frequency of heads and p, the probability of a head in a single trial, exceeds € can be made arbitrarily small by tossing the coin a sufficiently large number of times. Let us see how Bernoulli's theorem follows from Chebyshev's inequality. First, notice that var(X;) = p(1 − p) for all i. Second, the random variables X1............X, are independent (the outcome of the ith toss has no influence on the outcome of jth). Now, from the fact that E (XY) = E(X)E (Y) for independent random variables, we get

Weak law of large numbers: Let X1, X2, X3,... be independent, identically distributed random variables such that var(X1) < ∞. Then for each € > 0

In other words, for any positive the probability that the deviation between the frequency in n trials and the expected value in a single trial exceeds can be made arbitrarily small by considering a sufficiently large number of trials.

For our coin-tossing example NH /n approximately equals p in another sense also. Suppose one asks for the probability that the frequency of heads (in the limit as the number of tosses becomes infinite) is actually equal to p. The answer was obtained by Borel in 1909:

Notice the complexity of the question. In order to deal with it, the sample space N is now by necessity the set of all infinite two-letter sequences w and the subset of interest is the set A of those sequences for which

where N, (w) is the number of H's among the first n letters of the infinite sequence w. It takes some work just to show that A is an event in the sample space N. Unlike the question that led to the weak law of large numbers, this question required the full apparatus of modern probability theory. An extension of Borel's result by Kolmogorov is known as the strong law of large numbers.

Strong law of large numbers: Let X1, X2, X3,... be independent, identically distributed random variables such that E(X) < ∞. Then

An Application of the Strong Law of Large Numbers. Let us illustrate the power of the strong law of large numbers by using it to answer the following question: What is the probability that, in an infinite sequence of tosses of a fair coin, two heads occur in succession?

We will first answer this question using only the rules governing the probabilities of independent events. In particular, we will use the axioms of countable additivity and complementarity and the rule of multiplication of probabilities. Let A be the event that a head occurs on the (2k 1)th toss and on the (2k)th toss. Each Ak is an elementary event, and P(A) = 1/4. Now, by the axiom of countable additivity, U, A is an event; in particular, it is the event that, for some k, heads occur on the (2k - 1)th and 2k th tosses. By the axiom of complementarity,

Since the events A1, A2, A3,... are independent, the events N – A1, N — A2, N — A3. . . . are also independent, and we can apply the rule of multiplication of probabilities:

Now we will answer the same question by using the strong law of large numbers. Let X be the random variable such that

Then X1, X2, X3,... is a sequence of independent random variables. Also, they all have the same distribution: (P(X; = 1) = 1/4, P(X; = 0) = 3/4, and E (X;) = 1/4. Therefore, according to the strong law of large numbers,

This result is stronger than that obtained above. It guarantees, with probability 1, the existence of infinitely many k's such that heads occur on the (2k - 1)th and (2k )th tosses; further, the set of all such k's has an arithmetic density of 1/4.

Borel's theorem marked the beginning of the development of modern probability theory, and Kolmogorov's extension to the strong law of large numbers greatly expanded its applicability. To quote Kac and Ulam:

"Like all great discoveries in mathematics the strong law of large numbers has been greatly generalized and extended; in the process it gave rise to new problems, and it stimulated the search for new methods. It was the first serious venture outside the circle of problems inherited from Laplace, a venture made possible only by developments in measure theory. These in turn were made possible only because of polarization of mathematical thinking along the lines of set theory."

The polarization Kac and Ulam were referring to concerns the great debate at the turn of the century about whether the infinite in mathematics should be based upon Cantor's set theory and its concomitant logical difficulties. The logical problems have been met, and today we use Cantor's theory with ease.

The Monte Carlo Method. One of Stan Ulam's great ideas, which was first developed and implemented by von Neumann and Metropolis, was the famous Monte Carlo method. It can be illustrated with Chebyshev's inequality. Suppose that we need to quickly get a rough estimate of (sin x)/x3 dx. Setting t = 1/x, the problem then is to estimate ft sin(1/t)dt. Let y、.............y, be independent random variables each uniformly distributed on [0,1]. That is, for all i, P(a < y < b) = ba, where (a,b) is a subinterval of [0,1]. Now set ƒ (t) = t sin(1/1) and for each i let X1 = f(y;). Then

Los Alamos Science Special Issue 1987

[merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][subsumed][subsumed][ocr errors][subsumed][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small]

(The reader may well wonder why such a large number of sample points is required to be only 90 percent certain of the value of the integral to within only two decimal places. The answer lies in the use of Chebyshev's inequality. By using instead the stronger central limit theorem, which will be introduced below, many fewer sample points are needed to yield a similar estimate.)

The Monte Carlo method is a wonderful idea and, of course, tailor-made for computers. Although it might be regarded simply as an aspect of the more ancient statistical sampling technique, it had many exciting new aspects. Three of these are (1) a scope of application that includes large-scale processes, such as neutron chain reactions; (2) the capability of being completely implemented on a digital computer; and (3) the idea of generating random numbers and random variables. How do we mechanically produce numbers y1,..., yn in [0,1] such that the y's are independent and identically distributed? The answer is we don't. Instead, so-called pseudo-random numbers are generated. Many fascinating problems surfaced with the advent of Monte Carlo. Dealing with them is one of the major accomplishments of the group of intellects gathered at Los Alamos in the forties and the fifties. (See "The Beginning of the Monte Carlo Method.")

« Previous Continue »

Books