CANTOR'S D uring the last quarter of the nineteenth century, Georg Cantor introduced a series of concepts that now form the cornerstone of all modern mathematics— set theory. Those concepts arose from Cantor's attempt to depict the sets of convergence or divergence of, say, trigonometric series. Many such sets have pathological properties that are illustrated by his famous construction, the "middlethird" set. This set is described by the following recursion. Consider the closed unit interval [0,1]. First remove the middle-third open interval, obtaining two intervals [0, 1/3] and [2/3, 1]. Next remove from each of these intervals its middle-third interval. We now have four closed subintervals each of length 1/9. Continue the process. After n steps we will have 2" closed subintervals of [0,1] each of length 1/3". From each of these we will remove the middle-third interval of length 1/3′′+1. Continue the process indefinitely. Cantor's middle-third set, K, consists of all numbers in [0,1] that are never removed. This set possesses a myriad of wonderful properties. For example, K is uncountable and yet has Lebesgue measure zero. To see that K has measure zero, consider the set {[0, 1] - K}, which consists of the open intervals that were removed at some stage. At the nth stage 2"-1 open intervals of length 1/3" were removed from the remainder. So, by the countable additivity of measure, μ([0, 1] − K) = 1/3+2/32+ + 2′′-1/3" + ... = (1/3)(1 + 2/3 + (2/3)2 + ...) = 1. imagined. (See “Learning from Ulam: Measurable Cardinals, Ergodicity, and Biomathematics.") Another example is the Banach-Tarski paradox. Banach and Tarski proved that each of two solid spheres S1 and S2 of different radii can be decomposed into the same finite number of sets, say S1 = A, UA2 U...UA, and S2 = B1 UB2 U..... UBn, such that all the A, 's and all the B, 's are among themselves pairwise disjoint and yet A, is congruent to B, for all i. It is therefore impossible to define meaures for these sets, since their union in one fashion yields a certain sphere and their union in a different fashion yields a sphere of different size! That such a construction is possible rests on the complicated structure, earlier pointed out by Hausdorff, of the group of rigid motions of three-dimensional Euclidean space. We close this section on measure theory with a few comments from Kac and Ulam. "Attempts to generalize the notion of measure were made from necessity. ... For example, one could formulate theorems that were valid for all real numbers except for those belonging to a specific set. One wanted to state in a rigorously defined way that the set of these exceptional points is in some sense small or negligible. One could 'neglect' merely countable sets as small in the noncountable continuum of all points but in most cases the exceptional sets turned out to be noncountable, though still of Lebesgue measure 0. In the theory of probability one has many statements that are valid 'with probability one' (or 'almost surely'). This simply means that they hold for ‘almost all' points of an appropriate set; i.e., for all points except for a set of measure 0. In statistical mechanics one has important theorems that assert properties of dynamic systems that are valid only for almost all initial conditions. One final remark: The notion or concept of measure is surely close to the most primitive intuition. The axiom of choice, that simply permits one to consider a new set Z obtained by putting together an element from each set of a family of disjoint sets, sounds so obvious as to be nearly trivial. And yet it leads to the Banach-Tarski paradox! One can see why a critical examination of the logical foundation of set theory was absolutely necessary and why the question of existence of mathematical constructs became a serious problem. If to exist is to be merely free from contradiction as Poincaré decreed, we have no choice but to learn to live with unpleasant things like nonmeasurable sets or Banach-Tarksi decompositions." Consistency Theorems for Probability Measures Now let us return to probability theory and consider the construction of countably additive probability measures. To see that a finitely additive measure cannot always be extended to a countably additive measure, consider the set N of integers and take as elementary events the subsets A of N such that either the set A is finite or the set - A is finite. Set So, μ() = 1 and satisfies the axioms of finite additivity and complementarity. Now consider the problem of defining a countably additive probability measure on the sample space of all infinite two-letter sequences (each of which represents the outcome of an infinite number of independent tosses of a fair coin). Take as an elementary event a set E consisting of all sequences whose first m letters are specified (m = 1, 2,...). Since there are 2′′ such elementary events, we use the axiom of finite additivity to assign a probability P of 1/2" to each such event. Can this function P, which has been defined on the elementary events, be extended to a countably additive measure defined on the o-field generated by the elementary events? Ulam and Lomnicki proved such an extension exists for any infinite sequence of independent trials. Kolmogorov obtained the ultimate consistency results by giving necessary and sufficient conditions under which an extension can be made from a finitely additive to a countably additive measure, including the case of non-independent trials. These extensions put the famous limiting laws of probability theory, such as the laws of large numbers, on solid ground. In the case of coin tossing we have chosen our elementary events to be sets of infinite sequences whose first m digits are fixed and have assigned them each a probability of 1/2" in agreement with the finitely additive measure. Now we will show that the measure defined by these choices is equivalent to Lebesgue's measure on the unit interval [0,1] and is therefore a well-defined countably additive measure. First associate the digit 1 with a head and the digit 0 with a tail and encode each outcome of an infinite number of tosses as an infinite sequence of 1's and O's (10110..., for example), which in turn can be looked upon as the binary representation of a real number t (0 ≤t 1). In this way we establish a correspondence between real numbers in [0,1] and infinite two-letter sequences; the correspondence can be made one-to-one by agreeing once and for all on which of the two infinite binary expansions to take when the choice presents itself. (For instance, we must decide between .01000... and .00111... as the binary representation of 1/4.) The use of the binary system is dictated not only by considerations of simplicity. As one can easily check, the crucial feature is that each elementary event maps into an interval whose length is equal to the corresponding probability of the event. In fact, fixing the first m letters of a sequence corresponds to fixing the first m binary digits of a number, and the set of real numbers whose first m binary digits are fixed covers the interval between /2" and ( +1)/2", where is 0, 1, 2,..., or 2" 1, depending on how the first m digits are fixed. Clearly the length of such an interval, 1/2", is equal to the probability of the corresponding elementary event. Thus the probability measure in the sample space N of all infinite two-letter sequences maps into the ordinary Lebesgue measure on the interval [0,1] and is therefore equivalent to it. The space of all infinite sequences of O's and I's is infinite-dimensional in the sense that it takes infinitely many "coordinates" to describe each "point" of the space. What we did was to construct a certain countably additive measure in the space that was "natural" from the point of view of independent tosses of a fair coin. Dear Ulam, many Manette and I5 "to write. 57 are ли thanks for your kind letter - both to see you soon. deliglited - or any other Princeton again. Out. 10. and we expect, and stay. date the rabouts which suits you - well be with very convenient to ris usi that you will stay as possible. as lors I agree wholeheartcally with your plans. an cif measure - theory. Caratheodorys exposition; which as Olar de Inobability and infinite direct products, and-above all more. as interpreting measure much and much less. as volume 1 in the а would really be good thing. At least I often felt is lacking. thing very good how badly such literature. What would be the style of present see any part of treative, and its length? I will be very Your glasks if you here miss. let me not my unser. on the ? looking forward, too, with great interest and mentioned mentioned, we Expecting to see you soon best greetings from This approach immediately suggests extensions to more general infinite-dimensional spaces in which each coordinate, instead of just being 0 or 1, can be an element of a more general set and need not even be a number. Such extensions, called product measures, were introduced by Lomnicki and Ulam in 1934. (Stan's idea of writing a book on measure theory emphasizing the probabilistic interpretation of measure is the subject of the accompanying letter from von Neumann to Ulam.) Measures for sets of curves have also been developed. The best known and most interesting of these was introduced by Norbert Wiener in the early 1920s and motivated by the theory of Brownian motion. Mathematicians have since found new and unexpected applications of the Wiener measure in seemingly unrelated parts of mathematics. For example, it turns out that the Wiener measure of the set of curves emanating from a point p in space and hitting a three-dimensional region R is equal to the electrostatic potential at p generated by a charge distribution that makes the boundary of the "conductor" R an equipotential surface on which the potential is equal to unity. Since the calculation of such a potential can be reduced by classical methods to solving a specific differential equation, we establish in this way a significant link between classical analysis and measure theory. Random Variables and Distribution Functions Having introduced the measure-theoretic foundations of probability, we now turn to a convenient formalism for analyzing problems in probability. In many problems the possible outcomes can be described by numerical quantities called random variables. For example, let X be the random variable describing the outcome of a single toss of a fair coin. Thus, set X equal to 1 if the toss yields a head and to 0 if the toss yields a tail. This is an example of an elementary random variable; that is, X is a function with a constant value on some elementary event and another constant value on the complementary event. In general a random variable is a real-valued function defined on the sample space that can be constructed from elementary random variables by forming algebraic combinations and taking limits. For example, NH, the number of heads obtained in n tosses of a coin, is a random variable defined on the sample space consisting of all sequences of T's and H's of length n; its value is equal to Σ;=, X,, where X¡ = 1 if the ith toss is a head and X; = 0 otherwise. In evaluating the outcomes of a set of measurements subject to random fluctuations, we are often interested in the mean, or expected, value of the random variable being measured. The expected value E(X) (or m) is defined as |