Axioms of Modern Probability Theory The contemporary approach to probability is quite simple. From the set of all possible outcomes (called the sample space), a collection of subsets (called elementary events) is chosen whose probabilities are assumed to be given once and for all. One then tries to calculate the probabilities of more complicated events by the use of two axioms. Axiom of additivity: If E, and E2 are events, then "E, or E2" is an event. Moreover, if E1 and E2 are disjoint events, (that is, the subsets corresponding to E, and E2 have no elements in common), then the probability of the event "E1 or E2" is the sum of the probabilities of E, and E2, provided, of course, that E, and E2 can be assigned probabilities. Symbolically, P(E1 UE2) = P(E1) + P(E2) provided E1 ~ E2 = 0. Axiom of complementarity: If an event E can be assigned a probability, then the event "not E❞ also can be assigned a probability. Moreover, since the whole sample space is assigned a probability of 1, P(not E) = P(NE) = 1 - P(E). Why these axioms? What is usually required of axioms is that they should codify intuitive assumptions and that they be directly verifiable in a variety of simple situations. The axioms above clearly hold in all situations to which Laplace's definition is unambiguously applicable; they are also in accord with almost every intuition one has about probabilities, except possibly those involved in quantum mechanics (Feynman 1951). As we will see in the section on measure theory, the axioms of additivity and complementarity have an impressive mathematical content. Nevertheless they are too general and all-embracing to stand alone as a foundation for a theory so rich and fruitful as probability theory. An additional axiom of "countable additivity" is required. That axiom is the basis for the limiting theorems presented below and their application through approximating forms. Finally, at the heart of the subject is the selection of elementary events and the decision on what probabilities to assign them. Here nonmathematical considerations come into play, and we must rely upon the empirical world to guide us toward promising areas of exploration. These considerations also lead to a central idea in modern probability theory-independence. The Definition of Independence Let us return to the experiment of tossing a coin n times. In attempting to construct any realistic and useful theory of coin tossing, we must first consider two entirely different questions: (1) What kind of coin is being tossed? (2) What is the tossing mechanism? The simplest assumptions are that the coin is fair and the tosses are "independent." Since the notion of independence is central to probability theory, we must discuss it in some detail. Events E and F are independent in the ordinary sense of the word if the occurrence of one has no influence on the occurrence of the other. Technically, the two events (or, for that matter, any finite number of events) are said to be independent if the rule of multiplication of probabilities is applicable; that is, if the probability of the joint occurrence of E and F is equal to the product of their individual probabilities, P(EOF) = P(E) P(F). Kac and Ulam justified this definition of independence as follows: "In other words, whenever E and F are independent, there should be a rule that would make it possible to calculate Prob. {E and F } provided only that one knows Prob. {E} and Prob. {F}. Moreover, this rule should be universal; it should be applicable to every pair of independent events. Such a rule takes on the form of a function f(x, y) of two variables x, y, and we can summarize by saying that whenever E and F are independent we have Prob. {E and F} = ƒ (Prob. {E}, Prob. {F}) Let us now consider the following experiment. Imagine a coin that can be 'loaded' in any way we wish (i.e., we can make the probability p of H any number between 0 and 1) and a four-faced die that can be 'loaded' to suit our purposes also. The faces of the die will be marked 1,2,3,4 and their respective probabilities will be denoted P1, P2, P3, P4; each p; is nonnegative and p1 + P2 + P3 + P4 = 1. We must now assume that whatever independence means, it should be possible to toss the coin and the die independently. If this is done and we consider (e.g.) the event 'H and (1 or 2)' then on the one hand Prob. {Hand (1 or 2)} = f(p, p1 + P2) while on the other hand, since the event ‘H and (1 or 2)' is equivalent to the event '(H and 1) or (H and 2),' we also have Prob. {H and(1 or 2)} = Prob. {H and1} + Prob. {H and 2} = f(p, p1) +f(p、P2) Note that we have used the axiom of additivity repeatedly. Thus f(p,P1+P2) = f (p,p1) + f (p,p2) for all p, P1, P2 restricted only by the inequalities 0≤p≤1, 0 ≤ P1, 0 ≤ P2, Pi + P2 ≤1 If one assumes, as seems proper, that ƒ depends continuously on its variables, it follows that f(x, y) = xy and hence the probability of a joint occurrence of independent events should be the product of the individual probabilities. This discussion (which we owe to H. Steinhaus) is an excellent illustration of the kind of informal (one might say 'behind the scenes') argument that precedes a formal definition. The argument is of the sort that says in effect: 'We do not really know what independence is, but whatever it is, if it is to make sense, it must have the following properties...' Having drawn from these properties appropriate consequences (e.g., that f(x, y) = xy in the above discussion), a mathematician is ready to tighten things logically and to propose a formal definition.” ――― Having now defined independence as the applicability of the rule of multiplication of probabilities, let us again derive the probability of obtaining m heads in n tosses of a coin loaded so that p is the probability of a head in a single toss and q = 1 p is the probability of a tail. If the tosses are assumed to be independent, the probability of obtaining a specified sequence of m heads (and (nm) tails) is p"q"-m (by the rule of multiplication of probabilities). Since there are () such sequences, the probability of the event that exactly m out of n independent tosses will be heads is (Here we have applied the axiom of additivity). We have arrived at this formula, first developed almost two centuries ago, by using the modern concept of independence rather than Laplace's concept of equiprobability. Probability and Measure Theory As soon as we consider problems involving an infinite (rather than a finite) number of outcomes, we can no longer rely on counting to determine probabilities. We need instead the concept of measure. Indeed, probabilities are measures; that is, they are numerical values assigned to sets in some collection of sets, namely to sets in the sample space of all possible outcomes. The realization, during the early part of this century, that probability theory could be cast in the mold of measure theory made probability theory respectable by supplying a rigorous framework. It also extended the scope of probability theory to new, more complex problems. Before presenting the general properties of a measure, let us consider two problems involving an infinite number of outcomes. One is the problem that led to Bertrand's paradox, namely, find the probability that a chord of a circle chosen at random is longer than the side of an inscribed equilateral triangle. For that problem the event A, or subset A, of chords that are longer and the sample space N of all chords could be depicted geometrically. Thus the relative sizes (measures) of the two sets could be compared even though each was an uncountable set. (The measures of those sets were either lengths or areas.) Another situation in which an infinity of outcomes needs to be considered is the following. Suppose two persons A and B are alternately tossing a coin and that A gets the first toss. What is the probability that A will be the first to toss a head? This can happen either on the first toss, or on the third (the first two being tails), or on the fifth (the first four being tails), and so on. The event that A will toss the first head is thus decomposed into an infinite number of disjoint events. If the coin is fair and the tosses independent (so that the rule of multiplication applies), then the probabilities of these events are This result hinges on one very crucial proviso: that we can extend the axiom of additivity to an infinite number of disjoint events. This proviso is the third axiom of modern probability theory. Axiom of countable additivity: If E1, E2, E3,... is an infinite sequence of disjoint events, then UE; is an event and P(UE) - ΣP (E = (E). i=1 Note that in solving the last problem we not only needed the axiom of countable additivity but also assumed that the probabilities used for finite sequences of trials are well defined on events in the space of infinite sequences of trials. Whether such probabilities could be defined that satisfy the axioms of additivity, complementarity, and countable additivity was one of the central problems of early twentieth-century mathematics. That problem is really the problem of defining a measure because, as we will see below, the axioms of probability are essentially identical with the required properties of a measure. Measure Theory. The most familiar examples of measures are areas in a plane or volumes in three-dimensional Euclidean space. These measures were first developed by the Greeks and greatly extended by the calculus of Newton and Leibnitz. As mathematics continued to develop, a need arose to assign measures to sets less "tame" than smooth curves, areas, and volumes. Studies of convergence and divergence of Fourier series focused attention on the "sizes" of various sets. For example, given a trigonometric series a, cos nt + b, sin nt, can one assign a measure to the set of t's for which the series converges? (Cantor's set theory, which ultimately became the cornerstone of all of modern mathematics, originated in his interest in trigonometric series and their sets of convergence.) For another example, how does one assign a measure to an uncountable set, such as Cantor's middle-third set? (See "Cantor's Middle-Third Set".) Answers to such questions led to the development of measure theory. The concept of measure can be formulated quite simply. One wants to be able to ގ assign to a set A a nonnegative number μ(A), which will be called the measure of A, with the following properties. Property 1: If A1, A2,... are disjoint sets that are measurable, that is, if each A, can be assigned a measure μ(A,), then their union A, UA2 U... (that is, the set consisting of the elements of A1, A2, ...) is also measurable. Moreover, μ(A, UA2 U...) = μ(А1) + μ(A2) + · · · . Property 2: If A and B are measurable and A is contained in B (A C B), then B – A (the set composed of elements that are in B but not in A) is also measurable. By property 1 then, μ(В - A) = μ(B) – μ(A). Two additional properties are assumed for measures on sets in a Euclidean space. Property 3: A certain set E, the unit set, is assumed to have measure 1: μ(E) = 1. Property 4: If two measurable sets are congruent (that is, a rigid motion maps one onto the other), their measures are equal. When dealing with sets of points on a line, in a plane, or in space, one chooses E to be an interval, a square, and a cube, respectively. These choices are dictated by a desire to have the measures assigned to tame sets agree with those assigned to them previously in geometry or calculus. Can one significantly enlarge the class of sets to which measures can be assigned in accordance with the above properties? The answer is a resounding yes, provided (and it is a crucial proviso) that in property 1 we allow infinitely many A's. When we do, the class of measurable sets includes all (well, almost all-perhaps there may be some exceptions ...) the sets considered in both classical and modern mathematics. Although the concept of countable additivity had been used previously by Poincaré, the explicit introduction and development of countably additive measures early in this century by Émile Borel and Henri Lebesgue originated a most vigorous and fruitful line of inquiry in mathematics. The Lebesgue measure is defined on sets that are closed under countably infinite unions, intersections, and complementations. (Such a collection of sets is called a σ-field.) Lebesgue's measure satisfies all four properties listed above. Lebesgue's measure on the real line is equivalent to our ordinary notion of length. But how general is the Lebesgue measure? Can one assign it to every set on the line? Vitali first showed that even the Lebesgue measure has its limitations, that there are sets on the line for which it cannot be defined. The construction of such nonmeasurable sets involves the use of the celebrated axiom of choice. Given a collection of disjoint sets, one can choose a single element from each and combine the selected elements to form a new set. This innocent-sounding axiom has many consequences that may seem strange or paradoxical. Indeed, in the landmark paper on measurable cardinals mentioned at the beginning of this article, Ulam showed (with the aid of the axiom of choice) that if a nontrivial measure satisfying properties 1 through 3 can be defined on all subsets of the real line, then the cardinality of the real numbers is larger than anyone 63 |