exemplified by A. L. Bowley's grouping of the wages paid to different i Without that delicatë instrument the doctrine of error can seldom classes.' be fully utilized; but some of its uses may be indicated before the 119. The division of concrete errors which has been proposed is introduction of technical difficulties. not to be confounded with another twofold classification, namely, 124 Having established the prevalence of the law of error,' we go observations which stand for a real objective thing, and on to its applications. The mere presumption that wherever three or A Variant such statistics as are not thus representative of something Classifica four independent causes co-operate, the law of error outside themselves, groups of which the mean is called | tends to be set up, has a certain speculative interest.16 too. "subjective," This division would be neither clear nor The assumption of the law as a hypothesis is legiti tions of useful. On the one hand so-called real means are often only approxi- | mate. When the presumption is confirmed by specific the Normal mately equal to objective quantities. Thus the proportional experience this knowledge is apt to be turned to Law. frequency with which one face of a die-the six suppose-turns up account. It is usefully applied to the practice of gunnery,14 to is only approximately given by the objective fact that the sixdetermine the proportion of shots which under assigned conis one face of a nearly perfect cube. For a set of dice with which | ditions may be expected to hit a zone of given size. The expendiWeldon experimented, the average frequency of a throw, presenting ture of ammunition required to hit an object can thence be inferred. either five or six points, proved to be not 3, but 0.3377: The Also the comparison between practice under different conditions is difference of this result from the regulation 0.3 is as unpredictable facilitated. In many kinds of examination it is found that the total from objective data, prior to experiment, as any of the means called marks given to different candidates for answers to the same set of subjective or fictitious. So the mean of errors of observation often questions range approximately in conformity with the law of error. differs from the thing observed by a so-called constant error." It is understood that the civil service commissioners have founded So shots may be constantly deflected from the bull s-eye by a steady on this fact some practical directions to examiners. Apart from wind or " drift." such direct applications, it is a useful addition to our knowledge 120. On the other hand, statistics, not purporting to represent a of a class that the measurable attributes of its members range in real object, have more or less close relations to magnitudes which conformity with this general law. Something is added to the truth cannot be described as fictitious. Where the items averaged are that "the days of a man are threescore and ten," if we may regard ratios, e.g. the proportion of births or deaths to the total population that epoch, or more exactly for England, 72, as "Nature's aim, the in several districts or other sections, it sometimes happens that the length of life for which she builds a man, the dispersion on each distribution of the ratios exactly corresponds to that which is ob- side of this point being. ncarly normal." 17 So Herschel says: tained in the simplest games of chance" combinational " distribu-"An (a mere) average gives us no assurance that the future will be tion in the phrase of Lexis. There is unmistakably suggested a like the past. A (normal) mcan may be reckoned on with the most sortition of the simplest type, with a real ascertainable relation complete confidence." 13 The existence of independent causes, is inbetween the number of "favourable cases "and the total number of | ferred from the fulfilment of the normal law, may be some guarantee cases. The most remarkable example of this property is presented of stability. In natural history especially have the conceptions by the proportion of male to female (or to total) births. Some supplied by the law of error been fruitful. Investigators are already other instances are given by Lexis and Westergaard. A similar on the track of this inquiry: if those members of a species whose size correspondence between the actual and the "combinational " dis- or other measurable attributes are above (or below) the average tribution has been found by Bortkevitch 6 in the case of very small are preferred-by "natural" or some other kind of selection--as probabilities (in which case the law of error is no longer“ normal "). parents, how will the law of frequency as regards that attribute And it is likely that some ratios-such as general death-rates--not be modified in the next generation? presenting combinational distribution, might be broken up into 125. A particularly perfect application of the normal law of subdivisions—such as death-rates for different occupations or age-error in more than one dimension is afforded by the movements of periods-cach distributed in that simple fashion. the molecules in a homogeneous gas. A gencral idea 121. Another sort of averages which it is difficult to class as sub of the role played by probabilities in the explanation Normal Disjective rather than objective occurs in some social statistics, under of these movements may be obtained without entering a the designation of index-numbers. The percentage which repre into the more complicated and controverted parts of Molecular sents the change in the value of money between two epochs is seldom the subject, without going beyond the initial very Velocities. regarded as the mere average change in the price of several articles abstract supposition of perfectly elastic equal spheres. For contaken at random, but rather as the measure of something, e.g. the venience of enunciation we may confine ourselves to two dimenvariation in the price of a given amount of commodities, or of a sions. Let us imagine, then, an enormous billiard-table with unit of commodity.' So something substantive appears to be de- perfectly elastic cushions and a frictionless cloth on which millions signated by the volume of trade, or that of the consumption of the of perfectly elastic balls rush hither and thither at random-colliding working classes, of which the growth is measured by appropriate with cach other a homogeneous chaos, with that sort of uniformity index-numbers, the former due to Bourne and Sir Robert Gilden, in the midst of diversity which is characteristic of probabilities. the latter to George Wood.10 Upon this hypothesis, if we fix attention on any * balls taken at 122. But apart from these peculiarities, any set of statistics random-they need not be, according to some they ought not to be, may be related to a certain quaesitum, very much as measurements contiguous-il 11 is very large, the average properties will be approxiare related to the object mcasured. That quaesitum is the limiting inately the same as those of the total mixture. In particular the or ultimate mean to which the series of statistics, if indefinitely average energy of the n balls may be equated to the average energy prolonged, would converge, the mean of the complete group; this of the total number of balls, say T/N, if T is the total energy and conception of a limit applying to any frequency-constant, to "c," N the total number of the balls. Now if we watch any one of the for instance, as well as "a" in the case of the normal curve." Then specimen balls long enough for it to undergo a great number of given statistics may be treated as samples from which to reason collisions, we observe that either of its velocity-components, say that up to the true constant by that principle of the calculus which in the direction of x, viz. u, receives accessions from an immense determines the comparative probability of different causes from number of independent causes in random fashion. We may presume, which an observed event may have emanated.2 therefore, that these will be distributed (among the n balls) according 123. Thus it appears that there is a characteristic more essential to the law of error. The law will not be of the type which was first to the statistician than the existence of an objective quaesitum, supposed, where the "spread "continually increases as the number namely, the use of that method which is primarily, but not ex- of the cicments is increased.20 Nor will it be of the type which was clusively, proper to that sort of quaesitum-inverse probability." afterwards mentioned 21 where the sprcad diminishes as the number of the elements is increased. The linear function by which the ele1 Waces in the United Kingdom in the Nineteenth Century; and art. ments are aggregated is here of an intermediate type; such that the " Wages " in the Ency. Bril., 10th ed., vol. xxxiii. mean square of deviation corresponding to the velocity remains ? Phil. Mag. (1900), p. 168. constant. The method of composition might be illustrated by the *Cf. Journ. Slal. Soc., Jubilee No., p. 192. process of taking , digits at random from mathematical tables adding • Massenerscheinungen. the differences between each digit and 4.5 the mean value of digits, * Grundzüge der Sialistik. Cr. Bowley, Elements of. Scaristics, and dividing the sum by Vr. Here are some figures obtained by taking at random batches of sixteen digits from the expansion of-, subtracting 16 X 4.5 from the sum of each batch, and Das Gesetz der kleinen Zahlen. divid g the remainder by V16: ? See for other definitions Report of the British Association (1889), pp: 136 and 161, and compare Walsh's exhaustive Measurement of General Exchange-Value. 14 Cf. above, par. 102. & Cí. Bowley, Elements of Statistics, ch. ix. 15 Cf. Galton's enthusiasm, Nctura! Inheritance, p. 66. Journ. Stat. Soc. (1874 and later). Parly. Papers (C. 2247) and 16 A lucid statement of the methods and results of probabilities applied to gunnery is given in the Official Text-book" of Gunnery 20 * Working-Class Progress since 1860," Journ. Stal. Soc (1899). | (1902). 47 Venn, Journ. Stal: Soc. (1891), p. 443. 11 On this conception compare Venn, Logic of Chance, chs. ibi. - 1 Ed. Rev. (1850), xcii. 23. and iv., and Sheppard, Proc. Lond. Math. Soc., p. 363 seq. I 15 Cf. Galton, Phil. Mag. (1875). xlix. 44. *Laplace's 6th principle, Théorie analytique, intro. X. 30 Above, par. 112. 13 See above, pars. 13 and 14. 1 Ibid. p. 302. p. 639. = zv (km.1-ris exp-L-215km + 2(1 –m). Cormal investigate the relations use in biological 112 +1.25, +0.75, -1,-1, +5.3, -2.75, +0.75, -2, the dots will be distributed so that the majority will be massed in +1.75 +3.25, +0:25, -2.75, -2.25, -0.5, +4.75, +0.25. two quadrants: in those for which & and n are both positive or both If, instead of sixteen, a million digits went to each batch, the general negative when r is positive, in those for which and 7 have opposite character of the series would be much the same; the aggregate signs when r is negative. In the limiting case, when rri the whole figures would continue to hover about zero with a standard deviation host will be massed along the line n=. every deviation E being of 8.25, a probable error of nearly 2. Here for instance are seven attended with an equal deviation 7. In general, to any deviation aggregates formed by recombining 252 out of the 256 digits above of one of the variables Ethere corresponds a set or“ array" (Pearson) utilized into batches of 36 according to the prescribed rule: viz. of values of the other variable; for which the frequency is given by subtracting 36 X 4.5 from the sum of each batch of 36 and dividing substituting E' for in the general equation. The section thus obtained the remainder by v36: proves to be a normal probability-curve with standard deviation V (1-r). The most probable value of n corresponding to the assigned -0.5, +3.3, +2·6, -0.6, +1.5, -2, +1. value of & is rt' The equation 7-18, or rather what it becomes The illustration brings into view the circumstance that though the when translated back to our original co-ordinates (y-6)/0:= system of molecules may start with a distribution of velocities other r(x-ao, where o, 02 are our Vk, v m respectively. is often than the normal, yet by repeated collisions the normal distribution called a regression-equation. A verification is to hand in the abovewill be superinduced. If both the velocities 4 and v are distributed cited statistics, which Weldon obtained by casting batches of dice. according to the law of error for one dimension, we may presume that If the dice were perfect, r (=l/v km) would equal ), and as the dice the joint values of u and o conform to the normal surface. Or we proved not to be very imperfect, the cocthcient is doubtless approximay reason directly that as the pair of velocities u and o is made mately =). Accordingly, we may expect that, if axes x and y up of a great number of elementary pairs (the co-ordinates in each of are drawn through the point of maximum-frequency at the centre of which need not, initially at least, be supposed uncorrelated) the the compartment containing 244 observations, corresponding to any law of frequency for concurrent values of u and » must be of the value of %, say zvi (where i is the side of each square compartment), normal form which may be written' the most probable value of y should be vi, and corresponding to y=2vi the most probable value of I should be vi. And in fact these regression-equations are fairly well fulfilled for the integer values of (more than which could not be expected from discrete observations): It may be presumed that r, the coefficient of correlation, is zero, lor, e.g, when x= +41, the value of y, for which the frequency (25) is a owing to the symmetry of the influences by which the molecular maximum, is as it ought to be + 2i; when r= -2i the maximum chaos is brought about, it is not to be supposed that there is any (119) is at y=-1; when I = -4i the maximum (16) is at y=-2i; connexion or repugnance between one direction of u, say south to when y is + 2i the maximum (138) is at x = +i; when y is -2i north, and one direction of o, say west to east. For a like reason the maximum (117) at x = -3, and in the two cases (x= +2i k must be supposed equal to m. Thus the average velocity = 2k; and y= +48), where the fulfilment is not exact, the failure is not which multiplied by m, the mass of a sphere, is to be equated to the very serious. average energy T/N. The reasoning may be extended with confi 128. Analogous statements hold good for the case of three dence to three dimensions, and with caution to contiguous molecules. or more dimensions of error. The normal law of error for any 126. Correlation cannot be ignored in another application of the number of variables, X 17 Xa, may be put in the form many-dimensioned law of error, its use in biological inquiries to 2 = (I(2T)n/2 VA) exp - (Rux, + Rxxy2 + &c. + 2R1413, + &c.1/24 investigate the relations between different generations. where A is the determinant: measurable attributes of children of the same parents Inga heights, but nearer the average of the general population. The amount of this " regression” is simply proportional to the distance of the “mid-parent's" height from the general average. | each r, e.g. 173 ( 1:2), is the coefficient of correlation between This is a case of very general law which governs the relations not only two of the variables, e.g. Xg, X;; Ru is the first minor of the deterbetween members of the same family, but also between members minant formed by omitting the first row and first column; R is of the same organism, and generally between two (or more) coexistent the first minor formed by omitting the second row and the second or in any way co-ordinated observations, each belonging to a normal column, and so on; R12 (= Rai) is the first minor formed by omitting group. Let x and y be the measurements of a pair thus constituted. the first column and second row (or vice versa). The principle of Then' it may be expected that the conjunction of particular values correlation plays an important rôle in natural history. It has refor x and y will approximately obey the two-dimensioned normal placed the notion that there is a simple proportion between the size law which has been already exhibited (see par. 114). of organs by the appropriate conception that there are simple 127. Regression-lines. --In the expression above given. put proportions existing between the deviation from the average of one Il km=r, and the equation for the frequency of pairs having values organ and the most probable value for the coexistent deviation of the of the attribute under measurement becomes other organ from its average.? Attributes favoured by "natural" or other selection are found to be correlated with other attributes which are not directly selected. The extent to which the attributes of an Vk mm J individual depend upon those of his ancestors as measured by correThis formula is of very general application. If two sets of measure lation. The principle is instrumental to most of the important ments were made on the height, or other measurable feature, of the " mathematical contributions " which Professor Pearson has made proverbial " Goodwin Sands" and “ Tenterden Steeple," and the to the theory of evolution. In social inquiries, also, the principle first measurement of one set was coupled with the first of the other promises a rich harvest. Where numerous fluctuating causes go to produce a result like pauperism or immunity from small-pox, set, the second with the second, and so on, the pairs of magnitudes thus presented would doubtless vary according to the above-written the ideal method of eliminating chance would be to construct law, only in that case r would presumably be zero; the expression for "regression-equations" of the following, type: "Change % 2 would reduce to the product of the two independent probabilities in pauperism in the decade 1871-1881) in rural districts = that particular values of x and y should concur. But slight inter -27.07%, +0.299 (change % out-relief ratio), +0.271 (change % dependences between things supposed to be totally unconnected on proportion of old), + .064 (change % in papulation)."10 would often be discovered by this law of error in two or more dimen 129. In order to determine the best values of the coefficients sions. It may be put in a more convenient form by substituting involved in the law of error, and to test the worth of the results obtained by using any values, recourse must E for (x-2)/k and 7 for (y-6)/vm. The equation of the surface Determina tion of then becomes 3 = (1/27V! - r*) exp-1 - 2ren + ]/2(1 – ). be had to inverse probability. 130. The simplest problem under this head is If the frequency of observations in the vicinity of a point is repre Coostaats by the loverse sented by the number of dots in a small increment of area, when r=o where the quaesitum is a single real object and the the dots will be distributed uniformly about the origin, the curves data consist of a large number of observations, of equal probability will be circles. When r is different from zero Xs. Xy, . . Xn, such that is the number were indefinitely increased, the completed series would form a normal probability-curve with the true point as its centre, and having a given modulus 6 It is 1 Above, par. 114, and below, par. 127. as if we had observed the position of the dints made by the fragments * Some plurality of independent causes is presumable. • Herschel's a priori proposition concerning the law of error in two | Cf note to par. 98, above dimensions (above, par. 99) might still be defended either as generally | Phil. Mag. (1892), p. 200 seq.; 1896. p. 211; Pearson, Trans. true, so many phenomena showing po trace of interdependence, or on Roy. Soc. (1896), 187, p. 302; Burbury, Phil. Mag. (1894), P: 145. the principle which justifies our putting for a probability that * Pearson, “On the Reconstruction of Prehistoric Races," Trans. is unknown (above, par. 6), or 5 for a decimal place that is neglected: | Roy. Soc. (1898), A, p. 174 seq.; Proc. Roy. Soc. (1898), p. 418. correlation being equally likely to be positive or negative. The * Pearson, "The Law of Ancestral Heredity," Trans. Roy. Soc.; latter sort of explanation may be offered for the less serious contrast Proc. Roy. Soc. (1898). between tbe a priori and the empirical proof of the law of error in • Papers in the Royal Society since 1895. one dimension (below, par. 158). 10 An example instructively discussed by Yule, Journ. Stal. Soc. . Cf. above, par. 115. | (1899). of an exploding shell so far as to know the distance of each mark . 133. Subject to similar speculative difficulties, the solution which measured (from an origin) along a right line, say the line of an has been obtained may be extended to the analogous problem in cxtended fortification, and it was known that the shell was fired | which the quaesilum is not the real value of an observed magnitude. perpendicular to the fortification from a distant ridge parallel to the but the mean to which a series of statistics indefinitely prolonged fortification, and that the shell was of a kind of which the fragments converges. are scattered according to a normal law' with a known coefficient 134. Next, let the modulus, still supposed given, not be the same of dispersion; the question is at what position on the distant ridge for all the observations, but ci for X., Ce for x2, &c. Then P becomes was the enemy's gun probably placed ? By received principles received principles proportional to the probability, say P, that the given set of observations should exp - [(x - x)/(?+ (x – xz)/(2* + &c.). have resulted from measuring (or aimning at) an object of which the And the value of x which is both the most probable and the “ most real position was between x and r + ax is advantageous"is (X1/0;' +xz/092 +&c.)/(1101° +102 +&c.); Method of least where J is a constant obtained by equating to unity mean square of observations made under similar con Poox ditions. This is the rule prescribed by the " method Squares. (since the given set of observations must have resulted from some of least squares "; but as the rule in this case has been deduced position on the axis of x). The value of x, from which the given by genuine inverse probability, the problem does not exemplity set of observations most probably resulted, is obtained by making P what is most characteristic in that method, namely, that a rule a maximum. Putting dP,'dx = 0, we have for the maximum deducible from the hypothesis that the errors of observations obey (dp/dx? being negative for this value) the arithmetic mean of the the normal law of error is employed in cases where the normal law given observations. The accuracy of the determination is mcasured is not known, or even is known not, to hold good. For example, by a probability-curve with modulus c/ vn. This in the course of a let the curve of error for each observation be of the form of very long siege if every case in which the given group of shell-marks &=(1/V (ac)]X expl - x*]c– 2j(x/c – 2x*/3c")), 31, , ... x was presented could be investigated, it would be where j is a small fraction, so that : may equally well be equated to found that the enemy's cannon was fired from the position r', the (1/ 01 - 2i(x/C - 2x'/30')] exp-x/c, a law which is actually (point right opposite to the) arithmetic mean of x1, x2, &c., Xn, with very prevalent. Then, according to the genuine inverse method, a frequency assigned by the equation = (v n/V -c) exp-n(x-*')?/c?. the most probable value of x is given by the quadratic equation The reasoning is applicable without material modification to the flog P= 0, where log P=const. – [(x — ,)?/C22 – Eaj[(x — *-)*/C*case in which the data and the quaesitum are not absolute quantities, 2(x - x)"/367), Edenoting summation over all the observations. but proportions; for instance, given the percentage of white balls According to the “method of least squares," the solution is the in several large batches drawn at random from an immense urn con weighted arithmetic mean of the observations, the weight of any taining black and white balls, to find the percentage of white balls observation being inversely proportional to the corresponding in the urn-the inverse problem associated with the name of Bayes. mean square, i.e. c,/2 (the terms of the integral which involve ; 131. Simple as this solution is, it is not the one which has most vanishing), which would be the solution if the j's are all pro. We recommended itself to Laplace. He envisages the quaesitum not so put for the solution of the given case what is known to be the solution much as that point which is most probably the real one, as that point of an essentially different case. How can this paradox be justified? which may most advantageously be put for the real one. In our illustration it is as if it were required to discover from a number 135. Many of the answers which have been given to this question seem to come to this. When the data are unmanageable, it is legitiof shot-marks not the point? which in the course of a long siege mate to attend to a part thereof, and to determine the most probable would be, most frequently the position of the cannon which had (or the "most advantageous ") value of the quaesitum, and the scattered the observed fragments but the point which it would degree of its accuracy, from the selected portion of the data as if it be best to treat as that position-to fire at, say, with a view of formed the whole. This throwing overboard of part of the data in silencing the enemy's gun--having regard not so much to the fre order to utilize the remainder has often to be resorted to in the quency with which the direction adopted is right, as to the extent rough course of applied probabilities. Thus an insurance office to which it is wrong in the long run. As the measure of the detri only takes account of the age and some other simple attributes of ment of error, Laplace' takes "la valeur moyenne de l'erreur à its customers, though a better bargain might be made in particular craindre," the mean first power of the errors taken positively on cases by taking into account all available details. The nature of cach side of the real point. The mean spare of errors is proposed by Gauss as the criterion. Any mean power indeed, the integral the method is particularly clear in the case where the given set of observations consists of several batches, the observations in any of any function which increases in absolute magnitude with the batch ranging under the same law of frequency with mcan x', increase of its variable, taken as the measure of the detriment, will and mean square of error kr, the function and the constants different lead to the same conclusion, if the normal law prevails. 132. Yet another speculative difficulty occurs in the simplest, and for different batches; then if we confine our attention to those parts of the data which are of the type x', and kn-ignoring what else may recurs in the more complicated inverse problem. In putting Pas be given as to the laws of error--we may treat the x','s as so many the probability, deduced from the observations that the real point observations, each ranging under the normal law of error with its for which they stand is x (between x and x + ax), it is tacitly coefficient of dispersion; and apply the rules proper to the normal assumed that prior to observation one value of x is as probable as law. Those rules applied to the data, considered as a set of dcrivaanother. In our illustration it must be assumed that the enemy's tive observations each formed by a batch of the original observations) gun was as likely to be at one point as another of (a certain tract of) averaged, give as the most probable (and also the most advantageous the ridge from which it was hired. If, apart from the evidence of combination of the observations the arithmetic mean weighted the shell-marks, there was any reason for thinking that the gun was according to the inverse mean square pertaining to each observation, situated at one point rather than another, the formula would require and for the law of the error to which the determination is liable to be modified. This a priori probability is sometimes grounded on the normal law with standard deviation (Ek/n)-the very rules our ignorance; according to another view, the procedure is justified by a rough general knowledge that over a tract of x for which P is that are prescribed by the method of least squares. 136. The principle involved might be illustrated by the proposal sensible one value of x occurs about as often as another. | to make the economy of datum a little less rigid: to utilize, not in deed all, but a little more of our materials-not only the mean ! If normally in any direction indifferently according to the two square of error for cach batch, but also the mean cube of error. To or three-dimensioned law of error, then normally in onc dimension begin with the simple case of a single homogenous batch: suppose when collected and distributed in belts perpendicular to a horizontal that in our example the fragments of the shell are no longer scattered right line, as in the example cited below, par. 155. according to the normal law. By the method of least squares it ? Or small interval (cf. preceding section). would still be proper to put the arithmetic mean to the given observa3" Toute erreur soit positive soit négative doit être considerée tions for the truc point required, and to measure the accuracy of comme un désavantage ou une perte réelle à un jeu quelconque," that determination by a probability-curve of which the modulus is Théorie analytique, art. 20 seq., especially art. 25. As to which (2k), where k is the mean square of deviation (of fragments from it is acutely remarked by Bravais (op. cit. p. 258), “Cette règle their mean). If it is thought desirable to utilize more of the data simple laisse à désirer une démonstration rigoureuse, car l'analogue there is available, the proposition that the arithmetic mean of a du cas actuel avec celui des jeux de hasard est loin d'être complète." Thicoria combinationis, pt. i. $ 6. Simon Newcomb is con- of Science, and ed. p. 146). See also“ A priori Probabilities," Phil. spicuous by walking in the way of Laplace and Gauss in his preser- Mag. (Sept. 1884), and Camb. Phil. Trans. (1885), vol. xiv. pt. ii. ence of the most advantageous to the most probable determinations. p. 147 seq. With Gauss he postulates that "the evil of an error is proportioned *Above, pars. 6,7. to the square of its magnitude" (American Journal of Mathematics, vol, vii. No. 4). • The mean square (+60 (z// #c) exp -za/codx = c/2. As argued by the present writer, Comb. Phil. Trans. (1885). The standard deviation pertaining to a set of (nir) composite vol. xiv. pt. ii. D. 161. Cf. Glaisher, Mem. Astronons. Soc observations, cach derived from the original 1 observations by 108. averaging a batch thereof numbering , is ✓ (k/r)/ (02/r) = V(k/n), The view taken by the present writer on the "Philosophy of when the given observations are all of the same weight: mulalis Chance," in Mind (1880; approved by Professor Pearson, Grammar I mutandis when the weights differ. allein mother Pot obficien illustration it muss another of (a certa widence of httle more h. but also homogeno no longer and distributedotthco normallving to the two. right lines numerous set of observations, say x, x3, ... *(taken as a sample 1 139. In its simplest form, where all the given observations are of from an indefinitely large group obeying any the same law of equal weight, this method is of wide applicability. Compared frequency) varies from set to set approximately according to the with the genuine inverse method, it is always more convenient, following law (to be established later) seldom much less accurate, sometimes even more accurate. If the given observations obey the normal law, the precision of the median is less than the precision of the arithmetic mean by only sonie 25% a discrepancy not very serious where only a rough estimate of the where c/2 the mean square of deviation, and j = the mean worth of an average is required. If the observations do not obey cube of deviation, and i/C3, say j. is small. Then, by abstrac- the normal law-especially if the extremities are abnormally divertion analogous to that which has just been attributed to the gent-the precision of the median may be greater than that of the method of least squares, we may regard the datum as a single arithmetic mean.. observation, the arithmetic mean (of a sample batch of obser- 140. Yet another instance of the contrast between genuine and vations) subject to the law of error =f(x). The most probable abridged inversion is afforded by the problem to determine the value of the quaesitum is therefore given by the equation f'(x-x') I modulus as well as the mean for a set of observations = 0, where x is the arithmetic mean of the given observations. Determin known to obey the normal law; what the first problem? From the resulting quadratic equation, putting r = x' + e, and ation of becomes when the cocfficient of dispersion is not given. Frequency recollecting that « is small we have e - jc. That is the correction By inverse probability we ought in that case, in addition Coastaois. due to the utilization of the mean cube of error. The most advan- | to the cquation dP/dx = 0, to put IP/dc = o. Whence tageous solution cannot now be determined,' f(x) being unsymmetri- c? - 2[(x - 2)2 + (x - x)2 + &c. + (x - xn) 1/n, and x = cal, without assuming a particular form for the function of detriment. (x1 + x2 + &c. + rTM)/n. This solution differs from that which is This method of least squares plus cubes may casily be extended to often given in the textbooks in that there, in the expression for the case of several batches. c?, (n-1) occurs in the denominator instead of n. The difference 137. This application of probabilities not to the actual data but is explained by the fact that the authorities referred to determine c, to a selected part thereof, this economy of the inverse method, is not by genuine inversion, but by ordinary induction, by a condition widely practised in miscellaneous statistics, where the object is to which certainly would be fulfilled in the long run, but does not determine whether the discrepancy between two sets of observation express the whole of our data; a condition in this respect like the is accidental or significant of a real difference.2 For instance, let equation of c to V7(e)/11, where e is the difference (taken positively. the data be ages at death of individuals of two classes (e.g. temperate without regard to its sign) between any observation and the arithmeor not so, urban or rural, &c.) who have been under observation, tic mean of all the observations. since the age of, say, 20. Granted that the ages at death conform 141. Of course the determination of the most probable value is to Gompertz's law; the determination of the modal age at death, subject to the speculative difficulties proper to a priori probability: that age at which the proportion of the total observed dying (per which are particularly striking in this case, as it appears cqually unit of time) is a maximum for each class, would most perfectly natural to take as that constant, of which the values are a priori be effected by the genuine inverse method. That method will also equally probable, k( = c/2), or even !" h( = 1/co), the measure of enable us to determine the probability that the two modes should weight, as in fact Laplace has done;" yet no two of these assumptions have differed to the observed extent by mere accident. According can be exactly true, i to the abridged method it suffices to proceed as if our data con 142. A more convenient determination is obtained from simple sisted of two observations x' and y', the average ages at death induction by cquating the modulus to some datum of the observed of the two classes, each average obeying the normal law of error, group to which it would be equal is the group were complete with respective moduli c = (x - Xi)2 + (x - X ) + &c.12/, in particular to the distance from the median of some percentile 47 = v ' - ) + (y' - y2)* + &c.)2/n, where x1, x9, dic., y, yz, &c., | (or point which marks off a certain percentage, e. are the respective sets of observed ages at death; as follows from observations) multiplied by a factor corresponding to the percentile the law of error, whatever the law of distribution of the given obtainable from a familiar table. Mr Sheppard has given an interest. observations. According to a well-known property of the normal ing proof 13 that we cannot by way of percentiles obtain such good" law, the difference between the averages of n and n' observations results for the frequency-constants as by the use of "the average respectively will range under a probability-curve with modulus and average square" (the method prescribed by inverse probability). va + ca, say c. Whence for the probability that a difference as 143. The same philosophical subtleties, with greater mathematical great as the observed one, say e, should have occurred by complications, meet us when we pass on to the case of two or more chance we have 11-0(1), where T elc, and 0(x) is the integral | quaesila. The problem under this head which mainly o 2/17 ('(exp = x2)dx, given in many treatises. exercised the older writers was to determine a number of Measure unknown quantities, given a larger number, n, of equa138. This sort of abridgment may be extended to other kinds of or equar meats. tions involving them. average besides the arithmetic, in particular the median (that point 144. Supposing the truc values approximately known, by substiAbridged which has as many of the given observations above as tuting the approximate values in the given equations and expanding below it). By simple induction we know that the Methods. according to Taylor's theorem, there will be obtained for the correc* median of a large sample of observations is a probable tions, say x, y..., n lincar equations of the form value for the true median; how probable is determined as follows aix+biy..=fi from a selection of our data. First suppose that all the observa. tions are of the same weight. If x' were the true median, 02x + bay.. =fa, the probability that as many as in + r of the observations should where each and 6 is a known coefficient, and each f is a fall on either side of that point is given by the normal law for which fallible observation. Suppose that the error to which cach is the exponent is - 2r/n. This probability that the observed median liable obeys the normal law, and that the modulus pertaining to each will differ from the true one by a certain number of observations is observation is the same which latter condition can be secured by connected with the probability that they will differ by a certain multiplying each equation by a proper factor-then if r' and y extent of the abscissa, by the proposition that the number of obscr. arc the true values of the quaesila, the frequency with which vations contained between the true and apparent median is equal (Gx' + bry' - si) assumes different values is given by the equation to the small difference between them multiplied by the density of z= 1/(V Tc') exp - 10.x + biy-si ?/C,, where c. is a constant which, observations at the median-in the case of normal and generally symmetrical curves the greatest ordinate. This is the second datum Trans. Roy. Soc. (1889), 192, p. 135, ante, where the error incident we require to select. In the case of the normal curve it may be to this kind of determination is ascertained with much precision. calculated from the modulus itsell, determined by induction from a 6 Cf. Phil. Mag. (1887), xxiv. 269 seq., where the median is preselection of data. If the observations are not all of the same worth, scribed in case of " discordant "(heterogeneous) observations. If the weight may be assigned by counting one observation as if it occurred more drastic remedy of rejecting part of the data is resorted to oftener than another. This is the essence of Laplace's Method Sheppard's method of performing that operation may be recomof Situation, mended (Proc. Lond. Math. Soc. vol. 31). He prescribes for cases to which the median may not be appropriate, namely, the determination * The use of the cubes is also contrasted with that of the squares of other frequency-constants besides the mean of the observations. (only) in this respect: that it is no longer a matter of indifference ? Above, par. 134. how many of the original observations we assign to the batch of which 8E.. Airy, Theory of Errors, art. 60. " the mean constitutes the single (compound) observation. Colt is a nice point that the expression for co, which has (1 - 1) : The object of the writer's paper on “Methods of Statistics" instead of n for denominator, though not the more probable, may yet in the Jubilee number of the Journ. Stol. Soc: (1885). .. - be the more advantageons (supposing that there were any sensible * See on the use of the inverse method to determine the mode of difference between the two). Cf. Camb. Phil. Trans. (1885), vol. xiv, a group, the present writer's paper on “ Probable Errors" in the pt. ii. p. 165; and " Probable Errors," Journ. Stat. Soc. (June 1908). Journ. Slat. Soc. (Sept. 1908)...." 10 Above, par. 96, note. Above, par. 103. 11 Théorie analytique, and supp. ed. 1847, p. 578. Théorie analytique, - 2nd supp. D. 164. Mécanique céleste, 19 See the matter discussed in Camb. Phil. Trans., lo bk. iii. art. 40; on which see the note in Bowdich's translation. I .. 13 Trans.-Roy. Soc. (1899), A, cxcii. 135. The method may be extended to other percentiles. See Czuber. , 14 Good as tested by a comparison of the mean squares of errors Beobachtungsfehler, $ 58. Cf. Phil. Mag. (1886). p. 375; and Sheppard, in the frequency-constant determined by the compared methods. Above M, fpot known beforehand. may be inferred. as in the simpler case, I positive for the negatwe) deviations of the values for one oroan from a set of observations. Similar statements holding for the or attribute measured by the modulus pertaining to that member, other equations, the probability that the given set of observations and n is the sum of the values of the other member, which are fi. iz. &c., should have resulted from a particular system of values associated with the constituents of E. This variety of this method for x, y .. is J exp [(a,x+by-f)/C2+(99x+bay-f.)/+&c.), is certainly much less troublesome, and is perhaps not much less where is a constant determined on the same principle as in the accurate, than the method prescribed by genuine inversion. analogous simpler cases. The condition that P should be a 151. A method of rejecting data analogous to the use of percentiles maximum gives as many linear equations for the determination in one dimension is practised when, given the frequency of observa. of x' y... as there are unknown quantities. tions for each increment of area, e.g. each Ax Ay, we utilize only 145. The solution proper to the case where the observations are the frequency for integral areas. Mr Sheppard has given an elegant known to arrange according to the normal law may be extended to solution of the problem: to find the correlation between two numerous observations ranging under any law, on the principles attributes, given the medians L, and M, of a normal group for each which justify the use of the Method of Lcast Squares in the case of attribute and the distribution of the total group, as thus. a single quaesitum. 146. As in that simple case, the principle of economy will now Below.L Above L, justify the use of the median, e.g. in the case of two quaesita, putting for the true values of x and y that point for which the sum of the perpendiculars let fall from it on each of a set of lines representing Below M P the given equations (properly weighted) is a minimum. 147. The older writers have expressed the error in the determination of one of the variables without reference to the error in the Normal other. But the error of one variable may be regarded of FIG. 12. while x'te. tn... is the real system, the (small) values of Il cos D is put for r, the coefficient of correlation, it is found 5,7. ... which will concur in the long run of systems from which the that D=R/(P+R). For example, let the group of statistics given set of observations result are normally correlated. From relating to dice already 'cited from Professor Weldon be arranged this point of view Bravais, in 1846, was led to several theorems in four quadrants by a horizontal and a vertical line, each of which which are applicable to the now more important case of correlation separates the total groups into two halves: lines' of which equain which & and 7 are given (not in general small) deviations from tions prove to be respectively y=6.11 and x=6:156. Sor R we the means of two or more correlated members (organs or attributes) have 1360-5, and for P 687-5 roughly. Whence D=TX0.66; r = forming a normal group. cos 0.66 X = - nearly, as it ought; the negative sign being 148. To determine the frequency-constants of such a group it is required by the circumstance that the lower part of Mr Sheppard's proper to proceed on the analogy of the simple case of one-dimen- diagram shown in fig. 12 corresponds to the upper part of Professor sioned crror. In the case of two dimensions, for instance, the I Weldon's diagram shown in par. 115. probability Pi that a given pair of observations (xi, yı) should 152. Necessity rather than convenience is sometimes the motive have resulted from a normal group of which the means are x' y for resort to percentiles. Professor Pearson has applied the median respectively, the standard deviations and on and the coefficient of method to determine the correlation between husbands and wives correlation r, may be written in respect of the darkness of eye-colour, a character which does not Axiyao, Ag2ar(1/25) V0102(1-) exp-E?, admit of exact graduation: "our numbers merely refer to certain where E? = (x' - x)/0 - 21(x - x - W/0,99 + y - yu/67. groupings, arranged, it is true, in increasing darkness of colour, but A similar statement holds for each other pair of observations in no way corresponding to equal increases in colour-intensity."10 (Xayz), (xays). . .; with analogous expressions for Pa. Pa... Whence, From data of this sort, having ascertained the number of husbands as in the simpler case, we have di XP2 X&c. XP/J (a constant) with eye-colours above the median tint who marry wives with eyefor P, the a posteriori probability that the given observations should colour above the median tint, Professor Pearson finds for r the have resulted from an assigned system of the frequency-constants. coefficient of correlation tool. A general method for determining The most probable system is determined by making Pa maximum, the frequency-constants when the data are, or are taken to be, and accordingly equating to zero cach of the following expressions of the integral sort has been given by Professor Pearson." Attention dᏢ dᏢ dᏢ dᏢ dᏢ should also be called to Mr Yule's treatment of the problem by a dx, dy, doi, doz, dr. sort of logical calculus on the lines of Boole and Jevons.12 The values of the arithmetic mean and of tne standard deviation 153. In the cases of correlation which have been so far considered, for each variable are what have been obtained in the simple case it has been presupposed that the things correlated range according of one chimension. The value of r is ('- x)'-)/0102. The to the normal law of error. But now, suppose the law, probable error of the determination is assigned on the assumption of distribution to be no longer normal: for instance, that thar Aboormal that the errors to which it is liable are small.. Such coefficients the dots on the plane of xy, is representing each a pair of work have already been calculated for a great number of interesting cases. members, are no longer grouped in elliptic (or circular) rings of For instance, the coefficient of correlation between the human equal frequency, that the locus of the maximum y deviation, stature and femur is 0.8, between the right and left femur is 0.96, corresponding to an assigned i deviation, is no longer a right bet ween the statures of husbands and wives is 0.28.6 line. How is the interdependence of these deviations to be . 149. This application of inverse probability to determine correla formulated? It is submitted that such data may be treated as if tion-coefficients and the error to which the determination is liable they were normal: by an extension of the Method of Least Squares, has been largely employed by Professor Pearson and other recent in two or more dimensions 14 Thus when the amount of pauperism writers. The use of the normal formula to measure the probable together with the amount of outdoor relief is plotted in several unions and improbable errors incident to such determinations is justified there is obtained a distribution far from normal. Nevertheless if by reasoning akin to that which has been employed in the general the average pauperism and average outdoor relief are taken for proof of the law of error. Professor Pearson has pointed out a aggregates--say quintettes or decades of unions taken at random, it circumstance which seems to be of great importance in the theory may be expected that these means will conform to the normal law, of evolution: that the errors incident to the determination of with cocfhcients obtained from the original data, according to the different frequency-cocfficients are apt to be mutually correlated. rule which is proper to the case of the normal law. 15 By obtaining Thus if a random selection be made from a certain population, the averages conforming to the normal law, as by the simple application correlation-coefficient which fits the organs of that set is apt to differ. of the method of least squares, we should not indeed have utilized from the coefficient proper to the complete group in the same sense the whole of our data, but we shall put a part of it in a very useful as some other frequency-coefficients. & Trans. Roy. Soc. (1899), A, 192, p. 1412 150. The last remark applies also to the determination of the Above, par. 115. a. I coefficients, in particular those of correlation, by abridged methods, 10 Grammar of Science, p. 432. . on principles explained with reference to the simple case: for instance 11 Trans. Roy. Soc.. A vol. 105. In this connexion reference by the formula s-En/EE, where is the sum of (some or all) the should also be made to Pearson's theory of " Contingency" in his thirteenth contribution to the Mathematica! Theory of Evolution" * Above, par. 130. " (Drapers' Company Research Memoirs). * See Phil. Mag. (1888), “ On a New Method of Reducing Trans. Roy. Soc. (1900), A, 194, p. ; 257; (1901), A, 197, Observation's "; where a comparison in respect of convenience and p. 91. accuracy with the received method is attempted. . 13 Above, par. 127. . * Corresponding to the k/vim of pars. 14, 127 above. 14 Above, par. 116... * Pearson, Trans. Roy. Soc., A.-191, p. 234. 15 If from the given set of n observations (each corresponding to a 5 Pearson, Grammar of Science, 2nd ed. p. 402, 431. point on the plane xy) there is derived a set of ns observations Trans. Roy. Soc. (1899), A, vol. 191; Biometrika, ii. 273. cach obtained by averaging a batch numbering s of the original * Above, par. 107. Compare the proof of the “Subsidiary Law observation; the coefficient of correlation for the derived system is of Error," as the law in this connexion may be called, in the paper the same as that which pertains to the original system. As to on " Probable Errors." Journ. Stal. Soc. (June 1908). I the standard deviation for the new system see note to par. 135. aps and theyed by formula determin 3 akin tos incident ofmula to roof of |