Page images
PDF
EPUB

putting OH, 7' = OK; as r = 2a sin 0, r′ = 20 sin d′,

Masin'e sin19′ ein (9 —9′)dədə'

with when the number of the constituents is indefinitely large, For then it may be presumed that any method of determining the mode will lead to the same result. This presumption pre

Professor Sylvester has remarked that this double integral, by supposes that the constituents are quantities of the kind which

means of the theorom

[merged small][merged small][merged small][ocr errors][merged small]

M1 = 35a2 36x

3.5.7 2.4.6.8

[ocr errors]
[blocks in formation]
[ocr errors]

form the sort of "series " which is proper to Probabilities. A similar presupposition is to be made with respect to the constituents of the other averages, so far as they are objects of probabilities.

96. The Law of Error.-Of the propositions respecting average with which Probabilities is concerned the most important are those which deal with the relation of the average to its constituents, and are commonly called "laws of error." Error.is

From this mean value we pass to the probability that four points defined in popular dictionaries as "deviation from truth"; within a circle shall form a re-entrant figure, viz.

35 P=1272

94. The function of expectation in this class of problem appears to afford an additional justification of the position here assigned to this conception' as distinguished from an average in the more general sense which is proper to the following Part.

PART II-AVERAGES AND LAWS OF ERROR

and since truth commonly lies in a mean, while measurements are some too large and some too small, the term in scientific diction is extended to deviations of statistics from their average, even when that average-like the mean of human or barometric heights-does not stand for any real objective thing. A “law of error" is a relation between the extent of a deviation and the frequency with which it occurs: for instance, the proposition that if a digit is taken at random from mathematical tables, the difference between that figure and the mean of the whole series (indefinitely prolonged) of figures so obtained, namely, 4.5, will in the long run prove to be equally often 0.5, 1.5, 2.5,3,5, 4.5. The assignment of frequency to discrete values-as o, 1, 2, &c., in the preceding example-is often replaced by a continuous curve with a corresponding equation. The distinction of being the law of error is bestowed on a function which is

95. Averages. An average may be defined as à quantity derived from a given set of quantities by a process such that, if the constituents become all equal, the average will coincide with the constituents, and the constituents not being equal, the average is greater than the least and less than the greatest of the constituents. For example, if x1, x2, . . . x, are the constituents, the following expressions form averages (called respect-applicable not merely to one sort of statistics-such as the digits ively the arithmetic, geometric and harmonic means):-

[ocr errors][ocr errors][ocr errors][merged small][merged small][merged small]

above instanced-but to the great variety of miscellaneous groups, generally at least, if not universally. What form is most deserving of this distinction is not decided by uniform usage; different authorities do not attach the same weight to the different grounds on which the claim is based, namely the extent of cases to which the law may be applicable, the closeness

The conditions of an average are likewise satisfied by innumer- of the application, and the presumption prior to specific experiable other symmetrical functions, for example:

[ocr errors][ocr errors][ocr errors]

The conception may be extended from symmetrical to unsymmetrical functions by supposing any one or more of the constituents in the former to be repeated several times. Thus if in the first of the averages above instanced (the arithmetic mean) the constituent x,, occurs / times, the expression is to be modified by putting lx, for x, in the numerator, and in the denominator, for n, n+r-1. The definition of an average covers a still wider field. The process employed need not be a function. One of the most important averages is formed by arranging the constituents in the order of magnitude and taking for the average a value which has as many constituents above it as below it, the median. The designation is also extended to that value about which the greatest number of the constituents cluster most closely, the "centre of greatest density," or (with reference to the geometrical representation of the grouping of the constituents) the greatest ordinate, or, as recurring most frequently, the mode. But to comply with the definition there must be added the condition that the mode does not occur at either extremity of the range between the greatest and the least of the constituents. There should be also in general added a definition of the process by which the mode is derived from the given constituents. Perhaps this specification may be dispensed 1 See introductory remarks and note to par. 95.

* A great variety of (functional) averages, including those which are best known, are comprehended in the following general form • ̃{M[4(x1), (x1). . . . •(xn)]}; where is an arbitrary function, 1 is inverse (such that ((x)) = x), M is any (functional) mean. When M denotes the arithmetic mean; if (x)= log x ((x)=e) we have the geometric mean; if (x)=1/x, we have the harmonic mean. Of this whole class of averages it is true that the average of several averages is equal to the average of all their constituents.

[blocks in formation]

ence in favour of the law. The term "the law of error" is here employed to denote (1) a species to which the title belongs by universal usage, (2) a wider class in favour of which there is the same sort of a priori presumption as that which is held to justify the more familiar species. The law of error thus understood forms the subject of the first section below.

97. Laws of Frequency.-What other laws of error may require notice are included in the wider genus "laws of frequency," ," which forms the subject of the second section. Laws of frequency, so far as they belong to the domain of Probabilities, relate much to the same sort of grouped statistics as laws of error, but do not, like them, connote an explicit reference to an average. Thus the sequence of random digits above instanced as affording a law of error, considered without reference to the mean value, presents the law of frequency that one digit occurs as often as another (in the long run). Every law of error is a law of frequency; but the converse is not true. For example, it is a law of frequency-discovered by Professor Pareto-that the number of incomes of different size (above a certain size) is approximately represented by the equation y=A/xa, where x denotes the size of an income, y the number of incomes of that size. But whether this generalization can be construed as a law of error( in the sense here defined) depends on the nice inquiry whether the point from which the frequency diminishes as the income x increases can be regarded as a mode," y diminishing as x decreases from that point.

See above, pt. i., pars. 3 and 4. Accordingly the expected value of the sum of n (similar) constituents (x+x+...+xx) may be regarded as an average, the average value of nx, where x, is any one of the constituents.

[ocr errors]

See as to the fact and the evidence for it, Venn, Logic of Chance, 3rd ed., pp. 111, 114. Cl. Ency. Brit., 8th ed., art Probability." P. 592; Bertrand, op. cit., preface § ii.; above, par. 59.

See his Cours d'économie politique, ii. 306. Cf. Bowley, Evidence before the Select Committee on Income Tax (1906, No. 365. Question 1163 seq.); Benini, Metodologica statistica, p. 324. referred to in the Journ. Stat Soc. (March, 1909).

Section 1.-The Law of Error.

sions (+); and (x2+) = • (x2) ×¢(y2); a functional 98. (1) The Normal Law of Error.-The simplest and best recog- equation of which the solution is the function above written. nized statement of the law of error, often called the normal A reason which satisfied Herschel is entitled to attention, especially law," is the equation • if it is endorsed by Thomson and Tait. But it must be confessed that the claim to universality is not, without some strain of interpretation,' to be reconciled with common experience.

[ocr errors]

x

A

more conveniently written (1/=c) exp-(x-a)2/c2, where is the magnitude of an observation or statistic," z is the proportional frequency of observations measuring x, a is the arithmetic mean of the group (supposed indefinitely multiplied) of similar statistics: c is a constant sometimes called the modulus "2 proper to the group; and the equation signifies that if any large number N of such a group is taken at random, the number of observations between x and x+ax is (approximately) equal to the right-hand side of the equation multiplied by Nax. graphical representation of the corresponding curve--sometimes called the "probability-curve "-is here given (fig. 10), showing ⚫ the general shape of the curve, and how its dimensions vary with the magnitude of the modulus c. The area being constant (viz. unity), the curve is furled up when c is small, spread out when c is large. There is added a table of integrals, corresponding to areas subtended by the curve; in a form suited for calculations of probability, the variable, 7, being the length of the abscissa referred to (divided by) the modulus. It may be noted that the points of inflexion in the figure are each at a distance from the origin of 1/2 modulus, a distance equal to the square foot of the mean square of error-often called the "standard deviation." Another notable value of the abscissa is that which divides the area on either side of the origin into two equal parts; commonly called the "probable error." The value of which corresponds to this point is 0.4769. . . .

44

M

FIG. 10.

B

99. An a priori proof of this law was given by Herschel as follows: The probability of an error depends solely on its magnitude and not on its direction;" positive and negative A priori errors are equally probable. "Suppose a ball dropped proof. from a given height with the intention that it should fall on a given mark,' errors in all directions are equally probable, and errors in perpendicular directions are independent. Accord ingly the required law, which must necessarily be general and apply alike in all cases, since the causes of error are supposed alike unknown," "15 is for one dimension of the form (x2), for two dimenOn this conception see below, par. 122.

E.g. in the article on " Probability" in the 9th ed. of the Ency. Bril.; also by Airy and other authorities. Bravais, in his article Sur la probabilité des erreurs. Mémoires présentés par divers savants" (1846), p. 257, takes as the "modulus or parameter" the inverse square of our c. Doubtless different parameters are suited to different purposes and contexts; c when we consult the common tables, and in connexion with the operator, as below, par. 160; k(c) when we investigate the formation of the probability-curve out of independent elements (below, par. 104); k(=1/c2) when we are concerned with weights or precisions (below, par. 134). If one form of the coefficient must be uniformly adhered to, probably, (=c/v2), for which Professor Pearson expresses a preference, appears the best. It is called by him the "standard deviation

Fuller tables are to be found in many accessible treatises. Burgess's tables in the Trans. of the Edin. Roy. Soc for 1900 are carried to a high degree of accuracy. Thorndike, in his Mental and Social Measurements, gives, among other useful tables, one referred to the standard deviation as the argument New tables of the probability integral are given by W. F. Sheppard, Biometrics, ii. 174 seq.

Edinburgh Review (1850), xcii. 19.

The italics are in the original The passage continues. “ And

[merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors]

100. There is, however, one class of phenomena to which Herschel's reasoning applies without reservation. In a "molecular chaos," such as the received kinetic theory of gases postulates, if a molecule be placed at rest at a given point and the distance which it travels from that point in a given time, driven hither and thither by colliding molecules, is regarded as an "error," it may be presumed that errors in all directions are equally probable and errors in perpendicular directions are independent. It is remarkable that a similar presumption with respect to the velocities of the molecules was employed by Clerk Maxwell, in his first approach to the theory of molecular motion, to establish the law of error in that region.

101, The Laplace-Quetelet Hypothesis.-That presumption has, indeed, not received general assent; and the law of error appears to be better rested on a proof which was originated by Laplace." According to this view, the normal law of error is a first approximation to the frequency with which different values are apt to be assumed by a variable magnitude dependent on a great number of independent variables, each of which assumes different values in random fashion over a limited range, according to a law of error, not in general the law, nor in general the same for each variable. The normal law prevails in nature because it often happens-in the world of atoms, in organic and in social life—that things depend on a number of independent agencies. Laplace, indeed, appears to have applied the mathematical principle on which this explanation depends only to examples (of the law of error) artificially generated by the process of taking averages. The merit of accounting for the prevalence of the law in rerum natura belongs rather to Quetelet. He, however, employed too simple a formula for the action of the causes. The hypothesis seems first to have been stated in all its generality both of mathematical theory and statistical exemplification by Glaisher."

The con

tion from Нуро

102. The validity of the explanation may best be tested by first (A) deducing the law of error from the condition of numerous independent causes; and (B) showing that the law is adequately fulfilled in a variety of concrete cases, in (4) Deduc which the condition is probably present. dition may be supposed to be perfectly fulfilled in games thetical of chance, or, more generally, sortitions, characterized by the circumstance that we have a knowledge prior to Condispecific experience of the proportion of what Laplace tions. calls favourable cases 10 to all cases-a category which includes, for instance, the distribution of digits obtained by random extracts from mathematical tables, as well as the distribution of the numbers of points on dominoes.

Games of Chance.

103. The genesis of the law of error is most clearly illustrated by the simplest sort of " game," that in which the sortítion is between two alternatives, heads or tails, hearts or not-hearts, or, gener ally, success or failure, the probability of a success being p and that of a failure 9, where p+g= 1. The number of such successes in the course of n trials may be considered as an aggregate made up of n independently varying elements, each of which assumes the values o or I with respective frequency q and p. The frequency of each value of the it is on this ignorance, and not on any peculiarity in cases, that the idea of probability in the abstract is formed." Cf. above, Natural Philosophy, pt. i. art. 391. For other a priori proofs see Czuber, Theorie der Beobachtungsfehler, th. i. 7Cf. note to par. 127.

par. 6.

He considered the effect as the sum of causes each of which obeys the simplest law of frequency, the symmetrical binomial.

Memoirs of Astronomical Society (1878), p. 105. Cf. Morgan Crofton, "On the Law of Errors of Observation," Trans. Roy. Soc. (1870), vol. clx. pt. i. p. 178.

10 Above, par. 2.

I

- lar Ax

[ocr errors]

-JarAx

[ocr errors]

[LAWS OF ERROR aggregate is given by a corresponding term in the expansion of originated by Laplace and generalized by Poisson. Some idea of (q+p), and by a well-known theorem this term is approximately this celebrated theory may be obtained from the following free equal to -v2/2npq ; where is the number of integers elements have one and the same locus of frequency, and that locus version, applied to a simple case. The case is that in which all the by which the term is distant from np (or an integer close to np); sented by the equation (5), where the centre of gravity is the is symmetrical about the centre of gravity. Let the locus be repreprovided that is of (or <) the order √n. Graphically, let the sortition made for each element be represented by the taking or origin, and (+)=(−); the construction signifying that the not taking with respective frequency p and q a step of length i. probability of the element having a value (between say -A and If a body starting from zero takes successively n such steps, the+4) is (E)A. Square brackets denoting summation between point at which it will most probably come to a stop is at npi extreme limits, put x(a) for [Sø(E)e√1a A] where is an integer (measured from zero); the probability of its stopping at any neigh- multiple of A (or Ax)=pAx, say. Form the mth power of x(a). bouring point within a range of √ni is given by the abovewritten law of frequency, vi being the distance of the stopping. The coefficient of e Fev in (x(a)) is the probability that the point from npi. Put vix and 2npqi=c; then the probability sum of the values of the m elements should be equal to rax; a may be written (1/√7 c) exp-x2/c2. probability which is equal to Axy, where y is the ordinate of the 104. It is a short step, but a difficult one, from this case, in by the sum of the elements). Owing to the symmetry of the locus representing the frequency of the compound quantity (formed which the element is binomial-heads or tails-to the general case, function the value of y, will not be altered if we substitute in which the element has several values, according to the law of frequency-consists, for instance, of the number of points pre- fore sented by a randomly-thrown die. According to the general nor if we substitute (e+arax_ + theorem, if Q is the sum of numerous elements, each of which e-√-la), that is cos arAx. Thus (x(a))" becomes a sum of assumes different magnitudes according to a law of frequency, terms of the form Axy, cos arAx, where yy. Now multiply z=f(x), the function f being in general different for differ- (x(a))" thus expressed by cos Axa, where, being an integer, ent elements, the number of times that Q assumes magnitudes Ax=x, the abscissa of the between x and x+Ax in the course of N trials is NzAx, if z = error the probability of whose occurrence is to be determined. The product will consist of a sum (1/√2k) exp-(x-a)1/2k; where a is the sum of the arithmetic of terms of the form Axy, (cos a(+1) Ax+cos a(r-1)Ax). means of all the elements, any one of which a, = every value of (except zero) is matched by a value equal in absolute magnitude, -r+, and likewise every value of square brackets denoting that the integrations extend between the+ is matched by value, the series takes the form extreme limits of the element's range, if the frequency-locus for each Axy, cos qaAx+Axy, where q has all possible integer values from 1 to the largest value of Ir increased by ; and the term free from element is continuous, it being understood that [f(x)dx] = circular functions is the equivalent of Axy, cos a(r+1)Ax, when and is the sum of the mean squares of error for each element,-1, together with Axy, cos a(r-t)Ax, when r+1. Now substitute for aAx a new symbol 8; and integrate with respect to B. => [ Seƒ,(a,+E)de ], if the frequency-locus for each element is con- the thus transformed (x(a)) cos (Axa between the limits ẞ=0 and B. The integrals of all the terms which are of the form Axy,cos q8 will vanish, and there will be left surviving only #Axу.. We thus obtain, as equal to #axy, x(B/Ax)}"cos 18dB. Now change the independent variable to a; then as dẞ=dox, Ary, Arda (x(a))" cos tAxa.

ΕΣ

= [Sxf,(x)dx], the

=1;

[ocr errors]

"

As

tinuous, where a, is the arithmetic mean of one of the elements, and the deviation of any value assumed by that element from a,, 2 denoting summation over all the elements. When the frequency. locus for the element is not continuous, the integrations which give the arithmetic mean and mean square of error for the element must be replaced by summations. For example, in the case of the dice above instanced, the law of frequency for each element is that it assumes equally often each of the values 1, 2, 3, 4, 5, 6. Thus the Replacing tax by x, and dividing both sides by Ax, we have arithmetic mean for each element is 3-5, and the mean square of error (3:51)2 + (3·5 − 2)2 + &c.1/6 2.916. Accordingly, the sum of the points obtained by tossing a large number, n, of dice at random will assume a particular value x with a frequency which is approximately assigned by the equation

[ocr errors]

z= (1/√ 5·83×) exp-(x-3·5)2/5.83n. The rule equally applies to the case in which the elements are not similar; one might be the number of points on a die, another the number of points on a domino, and so on. element is no longer represented by a step which is either null or i, Graphically, each but by a step which may be, with an assigned probability, one or other of several degrees between those limits, the law of frequency and the range of i being different for the different elements.

105. Variant Proofs.-The evidence of these statements can only be indicated here. All the proofs which have been offered involve some postulate as to the deviation of the elements from their respective centres of gravity, their " errors." If these errors extended to infinity, it might well happen that the law of error would not be fulfilled by a sum of such elements. The necessary and sufficient postulate appears to be that the mean powers of deviation for the elements, the second (above written) and the similarly formed third, fourth, &c., powers (up to some assigned power), should be finite. 106. (1) The proof which seems to flow most directly from this postulate proceeds thus. It is deduced that the mean powers of deviation for the proposed representative curve, the law of error (up to a certain power), differ from the corresponding powers of the actual locus by quantities which are negligible when the number of the elements is large. But loci which have their mean powers of deviation (up to some certain power) approximately equal may be considered as approximately coincident.

1

107. (2) The earliest and best-known proof is that which was By the use of Stirling's and Bernoulli's theorems, Todhunter, History... of Probability. The statement includes the case of a linear function, since an element multiplied by a constant is still an element.

E.g. if the frequency-locus of each element were 1/(1+x2). extending to infinity in both directions. But extension to infinity would not be fatal, if the form of the element's locus were normal.

For a fuller exposition and a justification of many of the statements which follow, see the writer's paper on "The Law of Error " in the Camb. Phil. Trans. (1905).

Loc. cit. pt. i. § 1.

On this criterion of coincidence see Karl Pearson's paper "On the Systematic Fitting of Curves," Biometrika, vols. i, and li

[ocr errors]

**/▲xda(x(a))TM cos ax:

Now expanding the cos ax which enters into the expression for x(a), we obtain

[ocr errors]

Ax

COS ax

x(a) = [Sø(a)] — — [Sø(a)a2Jx2++ [Sø(a)a′]x• Performing the summations indicated, we express x(a) in terms of expressible in terms of the mean powers of the compound locus. the mean powers of deviation for an element. Whence x(a) is pound, which is the sum of the mean second powers of deviation for First and chief is the mean second power of deviation for the comequated to the elements, say k. It is found that the sought probability may be -Jack cos ax+k2 * dxa1e - }a2k where k is the coefficient defined below. Here /Ax may be replaced by, since the finite difference Ax is small with respect to unity when the number of the elements is large 10 and thus the integrals involved become equatcable to known definite integrals. If it were allowable to neglect all the terms of the series but the first the expression would reduce to, the normal law of error. But it is allowable to neglect the terms after the first, in a first approximation, for values of x not exceeding a certain range, the number of the elements being large, and if the postulate above enunciated is satisfied." With these reservations it is proved that the sum of a number of similar and symmetrical elements conforms to the normal law of error. The proof is by parity extended to the frequency functions; and, by a bolder use of imaginary quantities, case in which the elements have different but still symmetrical to the case of unsymmetrical functions.

Laplace, Théorie analytique des probabilités, bk. ii. ch. iv.; Poisson, Recherches sur la probabilité des jugements. Good restatements of this proof are given by Todhunter, History. of Probability art. 1004, and by Czuber, Theorie der Beobachtungsfehler, art. 38 and Th. 2, § 4.

The symbol is used to denote absolute magnitude, abstraction being made of sign. 10 Loc. cit. app. 1.

Below, pars. 159, 160.
"Loe. cit. p. 53 and context.

[merged small][ocr errors][merged small]

then the magnitudes of the B's in the neighbourhood of their maximum (say B1) will be disposed in accordance with a "probability curve,' or normal law of error.

109. (4) Professor Morgan Crofton's original proof of the law of error is based on a datum obtained by observing the effect which the introduction of a new element produces on the frequency-locus for the aggregate of elements. It seems to be assumed, very properly, that the sought function involves as constants some at least of the mean powers of the aggregate, in particular the mean second power, say k. We may without loss of generality refer each of the elements (and accordingly the aggregate) to its respective centre of gravity. Then if y, f(x), is the ordinate of the frequency-locus for the aggregate before taking in a new element, and y=ay the ordinate after that operation, by a well-known principle, y+ay = [Som(§)ƒ(x − )AE]. where 7,(), is the frequency-locus for the new element, and the square brackets indicate that the summation is to extend over the whole range of values assumed by that element. Expanding in ascending powers of (each value of) and neglecting powers above the second, as is found to be legitimate under the conditions specified, we have (since the first mean power of the element vanishes)

[merged small][ocr errors]

From the fundamental proposition that the mean square for the aggregate equals the sum of mean squares for the elements it follows that [SE()4] the mean second power of deviation for the mth element is equal to ak, the addition to k the mean second power of deviation for the aggregate. There is thus obtained a partial differential equation of the second order

dy day

(1) A subsidiary equation is (in effect) obtained by Professor Crofton from the property that if the unit according to which the axis of x is graduated is altered in any assigned ratio, there must be a corresponding alteration both of the ordinate expressing the frequency: of the aggregate and of the mean square of deviation for the aggregation. By supposing the alteration indefinitely small he obtains a second partial differential equation, viz. (in the notation here adopted) dy dy +x+2k =0.

(2)

From these two equations, regard being had to certian other conditions of the problem, it is deducible that y=Ceak, where C is a constant of which the value is determined by the condition that

√ _ ydx = 1.

110. (5) The condition on which Professor Crofton's proof is based may be called differential, as obtained from the introduction of a single new element. There is also an integral condition_obtained from the introduction of a whole set of new elements. For let A be the sum of my elements, fluctuating according to the sought law of error. Let B be the sum of another set of elements m2 in number (m, and m, both large). Then Q a quantity formed by adding together each pair of concurrent values presented by A and B must also conform to the law of error, since Q is the sum of mi+m2 elements. The general form which satisfies this condition of reproductivity is limited by other conditions to the normal law of error.

III. The list of variant proofs is not yet exhausted, but enough has been said to establish the proposition that a sum of numerous elements of the kind described will fluctuate approximately according to the normal law of error.

112. As the number of elements is increased, the constant above designated k continually increases; so that the curve representing the frequency of the compound magnitude spreads out from its centre. It is otherwise if instead of the simple sum we consider the linear function formed by adding the m elements each multiplied by 1/m. The "spread" of the average thus constituted will continually diminish as the number of the elements is increased; the sides closing in as the

Varieties of Linear Function.

The Analyst (Iowa), vols. v., vi., vii. passim; and especially vi. 142 seq., vii. 172 seq.

Morgan Crofton, loc. cit. p. 781, col. a. The principle has been used by the present writer in the Phil. Mag. (1883), xví. 301.

For a criticism and extension of Crofton's proof see the already cited paper on "The Law of Error," Camb. Phil. Trans. (1905), pt. i. § 2. Space does not permit the reproduction of Crofton's as given in the 9th ed. of the Ency. Brit. (art. "Probability,"

Loc. cit. pt. 1. § 4; and app. 6. Loc. cit. p. 122 seq.

vertex rises up. The change in "spread" produced by the accession of new elements is illustrated by the transition from the high to the low curve, in fig. 10, in the case of a sum; in the case of an average (arithmetic mean) by the reverse relation.

118. The proposition which has been proved for linear functions may be extended to any other function of numerous variables, each representing the value assumed by an independently Extension fluctuating element; if the function may be expanded to in ascending powers of the variables, according to Non-linear Taylor's theorem, and all the powers after the first Functions. may be neglected. The matter is not so simple as it is often represented, when the variable elements may assume large, perhaps infinite, values; but with the aid of the postulate above enunciated the difficulty can be overcome.

to two or

114. All the proofs which have been noticed have been extended to errors in two (or more) dimensions. Let Q be the sum of a number of elements, each of which, being a function, of two variables, x and y, assumes different pairs of extension values according to a law of frequency iz,f,(x, y), the more functions being in general different for different elements. Dimensions. The frequency with which Q assumes values of the variables between x and +Ar and between y and y+Ay is zAxAy, if ̧m(x− a)2 — 21(x — a) (q − b) + k(y — b)2; 2(km)

2=

I

2x √ km - exp - ?

where, as in the simpler case, a=Za,, a, being the arithmetic mean of the values of x assumed in the long run by one of the elements, b is the corresponding sum for values of y, and

[blocks in formation]

z[ √s(x − c,)(y — b,)f,(x, y)dxdy];

the summation extending over all the elements, and the integration between the extreme limits of each; supposing that the law of frequency for each element is contin- 2 uous, otherwise summation is to be ample, let each element be constituted substituted for integration. For extossed, the number of heads presented as follows: Three coins having been by the first and second coins together sented by the second and third coins is put for x, the number of heads pretogether is put for y. The law of frequency for the element is represented in fig. 11, the integers outside denoting the values of x or y, the fractions inside probabilites of particular values of x and y concurring.

FIG. 11.

If i is the distance from 0 to 1 and from 1 to 2 on the abscissa, and the corresponding distance on the ordinate, the mean of the values of x for the element-Aa, as we may say, is i, and the corresponding mean square of horizontal deviations is. Likewise Ab=i; Am="; and Al = {(+i×+i'-iX —i') ii. Accordingly, if n such elements are put together (if n steps of the kind which the diagram represents are taken), the frequency with which a particular pair of aggregates x and y will concur, with which a particular point on the plane of xy, namely, x=ri and y=ri, will be reached, is given by the equation

[ocr errors][merged small][merged small][merged small][merged small][merged small][merged small]

115. A verification is afforded by a set of statistics obtained with dice by Weldon, and here reproduced by his permission. A success is in this experiment defined, not by obtaining a head when a coin is tossed, but by obtaining a face with more than three points on it when a die is tossed; the probabilities of the two events are the same, or rather would be if coins and dice were perfectly symmetrical. Professor Weldon virtually took six steps of the sort above described when, six painted dice having been thrown, he added the number of successes in that painted batch to the number of successes in another batch of six to form his x, and to the number of successes in a third batch of six to form his y. The result is represented in the annexed table, where each degree on the axis of x and y respectively corresponds to thei and ' of the preceding paragraphs, and i= i'. The observed frequencies being represented by numerals, a general correspondence between the facts and the formula is apparent.

Loc. cit. pt. ii. § 7.

The second by Burbury, in Phil. Mag. (1894), xxxvii. 145; the third by its author in the Analyst for 1881; and the remainder by the present writer in Phil. Mag. (1896), xii. 247; and Camb. Phil. Trans. (1905), loc. cit.

Compare the formula for the simple case above, § 4

On the irregularity of the dice with which Weldon experimented, see Pearson, Phil. Mag. (1900), p. 167.

The maximum frequency is, as it ought to be, at the point x=6i, | y-6. The density is particularly great along a line through that point, making 45° with the axis of x; particularly small in the complementary direction. This also is as it ought to be. For if the centre is made the origin by substituting x for (x-a) and y for (-b), and then new co-ordinates X and Y are taken, making an angle with x and y respectively, the curve which is traced on the plane of zX by its intersection with the surface is of the form

z=J exp-Xk sin3 0-2l cos e sin 0+m cos3 0]/2(km—P2), a probability-curve which will be more or less spread out according as the factor k sin 0-2l cos e sin @+m cos é is less or greater. Now this expression has a minimum or maximum when (k-m) sin 0-2/ cos 20=0; a minimum when (k-m) cos 20+2 Isin 20 is positive, and a maximum when that criterion is negative; that is, in the present case, where km, a minimum when 0 and a maximum when 04.

[merged small][ocr errors][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

116. Characteristics of the Law of Error.As may be presumed from the examples just given, in order that there should be some approximation to the normal law the number of elements need not be very great. A very tolerable imitation of the probability-curve has been obtained by superposing three elements, each obeying a law of frequency quite different from the normal one, namely, that simple law according to which one value of a variable occurs as frequently as another between the limits within which the variation is confined (y1/2a, between limits x=+a, x=−a). If the component elements obey unsymmetrical laws of frequency, the compound will indeed be to some extent unsymmetrical, unlike the "normal" probability-curve. But, as the number of the elements is increased, the portion of the compound curve in the neighbourhood of its centre of gravity tends to be rounded off into the normal shape. The portion of the compound curve which is sensibly identical with a curve of the "normal" family becomes greater the greater the number of independent elements; caeteris paribus, and granted certain conditions as to the equality and the range of the elements. It will readily be granted that if one component predominates, it may unduly impress its own character on the compound. But it should be pointed out that the characteristic with which we are now concerned is not average magnitude, but deviation from the average. The component elements may be very unequal in their contributions to the average magnitude of the compound without prejudice to its "normal" character, provided that the fluctuation of all or many of the elements is of one and the same order. The proof of the law requires that the contribution made by each element to the mean square of deviation for the compound, k, should be small, capable of being treated as differential with respect to k. It is not necessary that all these small quantities should be of the same order, but only that they should admit of being rearranged, by massing together those of a smaller order, as a numerous set of

Experiments in pari materia performed by A. D. Darbishire afford additional illustrations. See "Some Tables for illustrating Statistical Correlation," Mem. and Proc. Man. Lil., and Phil. Soc., vol. li. pt. iii.

Journ. Stat. Soc. (March 1900), p. 73, referring to Burton, Phil. Mag. (1883), xvi. 301.

independent elements in which no two or three stand out as sui generis in respect of the magnitude of their fluctuation. For example, if one element consist of the number of points on a domino (the sum of two digits taken at random), and other elements, each of either I or o according as heads or tails turn up when a coin is cast, the first element, having a mean square of deviation 16.5, will not be of the same order as the others, each having 0-25 for its mean square of deviation. But sixty-six of the latter taken together would constitute an independent element of the same order as the first one; and accordingly if there are several times sixty-six elements of the for the generation of the normal distribution will be satisfied. These latter sort, along with one or two of the former sort, the conditions propositions would evidently be unaffected by altering the average element, that is, by adding a greater or less fixed magnitude to each magnitude, without altering the deviation from the average, for any element. The propositions are adapted to the case in which the elements fluctuate according to a law of frequency other than the normal. For if they are already normal, the aforesaid conditions are unnecessary. The normal law will be obeyed by the sum of elements which each obey it, even though they are not numerous and not independent and not of the same order in respect of the extent of fluctuation. A similar distinction is to be drawn with respect to some further conditions which the reasoning requires. A limitation as to the range of the elements is not necessary when they are already normal, or even have a certain affinity to the normal curve. Very large values of the element are not excluded, provided they are sufficiently rare. What has been said of curves with special reference to one dimension is of course to be extended to the case of surfaces and many dimensions. In all cases the theorem that under the conditions stated the normal law of error will be generated is to be distinguished from the hypothesis that the conditions are fairly well fulfilled in ordinary experience.

117. Having deduced the genesis of the law of error from ideal conditions such as are attributed to perfectly fair, (B) Verificagames of chance, we have next to inquire how far tion of the these conditions are realized and the law fulfilled in Normal Law. common experience.

Errors

proper.

118. Among important concrete cases errors of observation occupy a leading place. The theory is brought to bear on this case by the hypothesis that an error is the algebraic sum of numerous elements, each varying according to a law of frequency special to itself. This hypothesis involves two assumptions: (1) that an error is dependent on numerous independent causes; (2) that the function expressing that dependence can be treated as a linear function, by expanding in terms of ascending powers (of the elements) according to Taylor's theorem and neglecting higher powers, or otherwise. The first assumption seems, in Dr Glaisher's words, "most natural and true. In any observation where great care is taken, so that no large error can occur, we can see that its accuracy is influenced by a great number of circumstances which ultimately depend on independent causes: the state of the observer's eye and his physiological condition in general, the state of the atmosphere, of the different parts of the instrument, &c.. evidently depend on a great number of causes, while each contributes to the actual error." The second assumption seems to be frequently realized in nature. But the assumption is not always safe. For example, where the velocities of molecules are distributed according to the normal law of error, with zero as centre, the energies must be distributed according to a quite different law. This rationale is applicable not only to the fallible perceptions of the senses, but also to impressions into which a large ingredient of inference enters, such as estimates of a man's height or weight from his appearance, and even higher acts of judgment. Aiming at an object is an act similar to measuring an object, misses are produced by much the same variety of causes as mistakes; and, accordingly, it is found that shots aimed at the same bull's-eye are apt to be distributed according to the normal law, whether in two dimensions on a target or according to their horizontal deviations, as exhibited below (par. 156). A residual class comprises miscellaneous statistics, physical as well as social, in which the normal law of error makes its appearance, presumably in consequence of the action numerous independent influences. Well-known instances are afforded by human heights and other bodily measurements, as tabulated by Quetelet and others." Professor Pearson has found that "the normal curve suffices to describe within the limits of random sampling the distri bution of the chief characters in man." The tendency of social phenomena to conform to the normal law of frequency is well

of

Memoirs of Astronomical Society (1878), p. 105. Journ. Stat. Soc. (1890), p. 462 seq.

Miscel laneous Statistics,

E.g. the marking of the same work by different examiners. Ibid. Lettres sur la théorie des probabilités and Physique sociale.

E.g. the measurements of Italian recruits, adduced in the Allante statistico, published under the direction of the Ministero de Agricul tura (Rome, 1882); and Weldon's measurements of crabs, Proc. Roy. Soc. liv. 321; discussed by Pearson in the Trans. Roy. Soc. (1894), vol. clxxxv. A.

Biometrika, iii. 395. Cf. ibid. p. 141.

« PreviousContinue »