« PreviousContinue »
Central Limit Theorem
We close this tutorial by returning to the de Moivre-Laplace theorem and interpreting it in the modern context. Let X; be a random variable describing the outcome of the ith toss of a coin; set X; 1 if the ith toss is a head and X; = 0 otherwise. Let Sn be the number of heads obtained in n tosses; that is Sn X + + Xn. Then the de Moivre-Laplace theorem can be stated as follows:
Now np = nE(X) = E(Sn) and np(1 - p) = Vno(X) = o(Sn). So if we “renormalize" S, by setting Y, = (Sn – E(Sn))/o(Sn), each Yn has a mean of 0 and a standard deviation of 1. Then the equation above tells us that the distribution function of Yn tends to the standard normal distribution. The central limit theorem is a generalization of this result to any sequence of identically distributed random variables. We state the central limit theorem formally.
Thus the normal distribution is the universal behavior in the domain of independent trials under renormalization. Its appearance in so many areas of science has led to many debates as to whether it is a “law of nature" or a mathematical theorem.
Thanks to the developments in modern probability theory, we begin our investigations with many powerful tools at our disposal. Those tools were forged during a period of tremendous upheavals and turmoil, a time when very careful analysis carried the day. At the heart of that analysis lay the concept of countable additivity. Stan Ulam played a seminal role in developing these tools and presenting them to us.
Ulam's talent for seeing new approaches to familiar problems is evident in one he posed concerning the distribution of energy in physical systems. Will the energy distribution of an isolated system of N interacting particles always evolve to some limiting energy distribution? And, if so, what is the distribution? (Note that this question differs from the one asked in statistical mechanics. There one assumes that at equilibrium the system will have the most probable distribution. One then derives that the most probable distribution is the Boltzmann distribution, the density of which is e-E/kt.)
Obviously, following the evolution of a system of N interacting particles in space and time is a very complex task. It was Stan's idea to simplify the situation by neglecting the spatial setting and redistributing the energy in an abstract random manner. What insights can one gain from such a simplification? One can hope for new perspectives on the original problem as well as on the standard results of statistical mechanics. Also, even if the simplification is unrealistic, one can hope to develop some techniques of analysis that can be applied to more realistic models. In this case David Blackwell and I were able to give an exact analysis of an abstract, highly nonlinear system by using a combination of the machinery of probability theory and higher order recursions (Blackwell and Mauldin 1985). We hope that the technique will be useful
in other contexts.
Let us state the problem more clearly and define what we mean by redistributing energy in an “abstract random manner.” Assume we have a vast number of indistinguishable particles with some initial distribution of energy, and that the average energy per particle is normalized to unity. Further, let us assume the particles interact only in pairs as follows: At each step in the evolution of the system, pair all the particles at random and let the total energy of each pair be redistributed between the members of the pair according to some fixed “law of redistribution" that is independent of the pairs. Iterate this procedure. Does the system have a limiting energy distribution and, if so, how does it depend on the redistribution law?
The Simplest Redistribution Law. To begin we will consider the simplest redistribution law: each particle in a random pair gets one-half the total energy of the pair. If the number of particles in the system is finite, it is intuitively clear that under iteration the total energy of the system will tend to become evenly distributed—all the particles
will tend to have the same energy. So, a system with only finitely many particles has a limiting distribution of energy, namely, a step function with a jump of size 1 at t = 1. and moreover, no matter what the initial distribution of energy is, the system tends to this distribution under iteration.
Even for a system with a continuum of particles, our observations for the finite case still hold. In order to see this, we formalize the problem in terms of probability theory.
Let X be a random variable corresponding to the initial energy of the particles. Thus, the distribution function F, associated with X is the initial distribution of energy: Fi(t) = P(X < 1) is the proportion of particles with energy less than 1. Our arguments and analysis will be based only on the knowledge of the energy distribution function and how it is transformed under iteration by the redistribution law. In terms of distribution functions, our normalization condition, that the average energy per particle is unity, means that the expected value of X, 80* t dFi(t), equals 1.
We seek a random variable T(X) corresponding to the energy per particle after applying the redistribution law once. To say that the indistinguishable particles are paired at random in the redistribution process means that, given one particle in the pair, we know nothing about the energy of the second except that its distribution function should be the initial distribution function Fi. In other words, we can describe the energy of the randomly paired particles by two independent random variables X and X2, each having the same distribution as X. Thus the simplest redistribution law, according to which paired particles share the total energy of the pair equally, can be expressed in terms of T(X), X1, and X2 as
Fig. 5. Consider a system of N particles with some arbitrary initial distribution of energy. Assume that the initial mean energy is 1 and that the particles interact in pairs. Assume further that the total energy of an interacting pair is redistributed so that each member of the pair acquires one-half the total energy of the pair. Then with probability 1 the system reaches a limiting energy distribution described by a step function with a step height of 1 at t = 1. That is, the probability that the energy per particle is less than t equals 0 for 1 < 1 and equals 1 (the initial mean energy) for t > 1.
To carry out the second iteration, we repeat the process. The energy T2(X) = T (T(X)) will have the same distribution as (Y/+Y2)/2, where Y, and Y2 are independent and each is distributed as T(X). In other words, if we let X1, X2, X3, and X4 be independent and distributed as X1, then Yi is distributed as (X1 + X2)/2, and Y2 is distributed as (X3 + Xa)/2. The energy is distributed as T2(X) = (X, + X2 +X3 +X4)/4.
After n iterations the energy per particle will have the same distribution as T"(X) = (X +...+X2n)/2", where the X;'s are independent and distributed as X. This expression for T"(X) is exactly the expression that appears in the strong law of large numbers (see page 71). Therefore the strong law tells us that the limiting energy of each particle w as n + o is
RANDOM LAW FOR ENERGY
this random process, the energies of almost all particles converge to unity. In terms of distribution functions, we say that in the space of all “potential” actualizations of this iterative random process, almost surely, or with probability 1, the limiting distribution of energy will be a step function with a jump of size 1 at 1 = 1 (Fig. 5).
Notice that for this simplest redistribution law (1) the redistribution operator T is a simple linear operator and (2) even so, the strong law of large numbers is needed to determine the limiting behavior.
More Complicated Redistribution Laws. Stan proposed more interesting laws of redistribution. The redistribution operator T for each of these laws is nonlinear, and different techniques are needed to analyze the system. For example, after pairing the particles, choose a number a between 0 and 1 at random. Then instead of giving each particle one-half the total energy of the pair, let us give one particle a times the total energy of the pair and give the other particle (1 - a) times the total energy. The energy T(X) will then have the same distribution as U (X1 + X2), where U is uniformly distributed on (0,1] (that is, all values between 0 and 1 are equally probable) and U ,X1, and X2 are independent. What happens to this system under iteration is a much more complicated matter. For one thing, unlike the redistribution operator in the simplest case, the operator T is now highly nonlinear and the law of large numbers is not available as a tool. A new approach is required. To get an idea of what to expect, Stan first used the computer as an experimental tool. From these studies he correctly guessed the limiting behavior (Ulam 1980): no matter what the initial distribution of energy is, we have convergence to the exponential distribution (Fig. 6).
Let me indicate how Blackwell and I proved this conjecture. We used a classical method of moments together with an analysis of a quadratic recursion. For now let us assume that a stable limiting distribution exists and let X have this distribution. Then T(X) = U (X1 + X2) has the same distribution. So, calculating mn, the nth moment of X (that is, the expected value of X"), we have
m= E (X") = E (T(X)") = E ((U(X, +X2))") = E (U"(X, + X2)").
By independence and the binomial theorem
Fig. 6. Consider a system identical to the one described in Fig. 5 except that the total energy of an interacting pair is redistributed randomly between the members of the pair. In particular, assume that one particle receives a randomly chosen fraction a of the total energy and the other particle receives the remainder. The system still reaches a limiting energy distribution, one equal to O for t < 0 and equal to 1 - e'for t > 0.
Since X, and X2 are independent, the expected value of each product is equal to the product of the expected values, E(X"X"=P) = E(X” E (X”-”). Substituting this into the equation above and using the definition of moments, we have
This is a quadratic recursion formula. Substituting the initial condition mı = 1, we find that m2 =
2 and m3 = 6. An induction argument shows that mn = n! for all n. But n! is the nth moment of the exponential distribution! Of course, our assumption is that a stable distribution and all its moments exist. It takes some work to prove that this assumption is indeed true and that no matter what initial distribution one starts with, the distribution of the iterates converges to the exponential.
It should not be too surprising that our result agrees in its general form with the Boltzmann distribution of statistical mechanics. After all, both are derived from similar assumptions. The Boltzmann distribution is derived from the assumptions that (1) energy and the number of particles are conserved, (2) all energy states are equally probable, and (3) the distribution of energy is the most probable distribution. In our problem we also assumed conservation of energy and number of particles. Moreover, taking U in our redistribution law to be the uniform distribution makes all energy states equally probable. The difference is that the iteration process selects the most probable distribution with no a priori assumption that the most probable distribution will be reached.
We can go further and replace U by any random variable with a symmetric distribution on (0,1). The symmetric condition insures that the particles are indistinguishable. We call the distribution of U the redistribution law. Again, one obtains a quadratic recursion formula. Blackwell and I analyzed this formula and showed that for every such U the system tends toward a stable limiting distribution. In other words, there is an attractive fixed point in the space of all distributions. Moreover, there is a one-to-one correspondence between the stable limiting distribution and the redistribution law that yields it.
Momentum Redistribution. There is a corresponding momentum problem. Assume we have a vast number of indistinguishable particles (all of unit mass) with some initial distribution of momentum. Let us assume that the particles interact in pairs as follows. At each step in the evolution of the system, pair all the particles at random and let the total momentum of each pair be redistributed between the members of the pair according to some law of redistribution that is independent of the pairs. Of course, we wish to conserve energy and momentum. These conservation laws place severe constraints on the possibilities. If y, and v2 are the initial velocity vectors of two particles in a pair and vi and va are the velocity vectors after collision, then by momentum conservation