Translation:Theoria combinationis observationum erroribus minimis obnoxiae

From testwiki
Revision as of 22:43, 21 March 2024 by imported>Luccul
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Translation header Template:TOC limit

Part One

1.

No matter how careful one is with observations concerning the measurement of physical quantities, they are inevitably subject to errors of varying degrees. These errors, in most cases, are not simple but arise from several distinct sources that it is best to distinguish into two classes.

Some causes of errors depend, for each observation, on variable circumstances independent of the result obtained: the errors arising from these are called "irregular" or "random," and like the circumstances that produce them, their value is not amenable to calculation. Such are the errors that arise from the imperfection of our senses and all those due to irregular external causes, e.g. vibrations of the air that blur our vision. Some of the errors due to the inevitable imperfection of even the best instruments, e.g. the roughness of the inner part of a level, its lack of absolute rigidity, etc., belong to this same category.

On the other hand, there are other causes that produce an identical error in all observations of the same kind, or one whose magnitude depends only on circumstances that can be viewed as essentially connected to the observation. We will call errors of this category "constant" or "regular" errors.

Moreover, one can see that this distinction is to a certain extent relative, and has a broader or narrower sense depending on the meaning one attaches to the idea of observations of the same nature. E.g. if one indefinitely repeats the measurement of the same angle, the errors arising from imperfect division of the instrument belongs to the class of constant errors. If, on the other hand, one successively measures several different angles, the errors due to imperfect division will be considered random until a table of errors relative to each division has been formed.

2.

We exclude the consideration of regular errors from our discussion. It is up to the observer to carefully investigate the causes that can produce a constant error, to eliminate them if possible, or at least assess their effect in order to correct it for each observation, which will then give the same result as if the constant cause had not existed. It is quite different for irregular errors: by their nature, they resist any calculation, and they must be tolerated in observations. However, by skillfully combining results, their influence can be minimized as much as possible. The following investigation is devoted to this most important topic.

3.

Errors arising from a simple and determinate cause in observations of the same kind are confined within certain limits that could undoubtedly be assigned if the nature of this cause were perfectly known. In most cases, all errors between these extreme limits must be considered possible. A thorough knowledge of each cause would reveal whether all these errors have equal or unequal likelihood, and in the latter case, what the relative probability of each of them is. The same remark applies to the total error resulting from the combination of several simple errors. This error will also be confined between two limits, one being the sum of the upper limits, the other the sum of the lower limits corresponding to the simple errors. All errors between these limits will be possible, and each can result, in an infinite number of ways, from suitable values attributed to the partial errors. Nevertheless, it is possible to assign a larger or smaller likelihood for each result, from which the law of relative probability can be derived, provided that the laws of each of the simple errors are assumed to be known, and ignoring the analytical difficulties involved in collecting all of the combinations.

Of course, certain sources of error produce errors that cannot vary according to a continuous law, but are instead capable of a finite number of values, such as errors arising from the imperfect division of instruments (if indeed one wants to classify them among random errors), because the number of divisions in a given instrument is essentially finite. Nevertheless, if it is assumed that not all sources of error are of this type, then it is clear that the complex of all possible total errors will form a series subject to the law of continuity, or, at least, several distinct series, if it so happens that, upon arranging all possible values of the discontinuous errors in order of magnitude, the difference between a pair of consecutive terms is greater than the difference between the extreme limits of the errors subject to the law of continuity. In practice, such a case will almost never occur, unless the the instrument is subject to gross defects.

4.

Let φ(x) denote the relative likelihood of an error x: this means, due to the continuity of the errors, that φ(x)dx is the probability that the error lies between the limits x and x+dx. In practice it is hardly possible, or perhaps impossible, to assign a form to the function φ a priori. Nevertheless, several general characteristics that it must necessarily present can be established: φ(x) is obviously a discontinuous function; it vanishes for all values of x not between the extreme errors. For any value between these limits, the function is positive (excluding the case indicated at the end of the previous article); in most cases, errors of opposite signs will be equally possible, and thus we will have: φ(x)=φ(x) Finally, since small errors are more easily made than large ones, φ(x) will generally have a maximum when x=0 and will continually decrease as x increases.

In general, the integral abφ(x)dx expresses the probability that the unknown error falls between the limits a and b. It follows that the value of this integral taken between the extreme limits of the possible errors will always be =1. And since φ(x) is zero for values not between these limits, it is clear that in all cases

Template:C

5.

Let us consider the integral xφ(x)dx and denote its value by k. If the sources of error are such that there is no reason for two equal errors of opposite signs to have unequal likelihood, we will have φ(x)=φ(x), and consequently, k=0. We conclude that if k does not vanish and has e.g. a positive value, then there necessarily exists an error source that produces only positive errors or, at least, produces them more easily than negative errors. This quantity k, which is the average of all possible errors, or the average value of x, can conveniently be referred to as the "constant part of the error". Moreover, it is easily proven that the constant part of the total error is the sum of the constant parts of the simple errors of which it is composed.

If the quantity k is assumed to be known and subtracted from the result of each observation, then, denoting the error of the corrected observation by x and the corresponding probability by φ(x), we will have x=xk, φ(x)=φ(x), and consequently, xφ(x)dx=xφ(x)dxkφ(x)dx=kk=0, i.e. the errors of the corrected observations will have no constant part, which is clear in and of itself.

6.

The value of the integral xφ(x)dx, which is the average value of x, reveals the presence or absence of a constant error, as well as the value of this error. Similarly, the integral x2φ(x)dx, which is the average value of x2, seems very suitable for defining and measuring, in a general manner, the uncertainty of a system of observations. Therefore, between two systems of observations of unequal precision, the one giving a smaller value to the integral x2φ(x)dx should be considered preferable. If it is argued that this convention is arbitrary and seemingly unnecessary, then we readily agree. The question at hand is inherently vague and can only be delimited by a somewhat arbitrary principle. Determining a quantity through observation can be likened, somewhat accurately, to a game in which there is a loss to be feared and no gain to be expected; each error being likened to a loss incurred, the relative apprehension about such a game should be expressed by the probable loss, i.e., by the sum of the products of the various possible losses by their respective probabilities. But what loss should be likened to a specific error? This is not clear in itself; its determination depends partly on our whim. It is evident, first of all, that the loss should not be regarded as proportional to the error committed; for, in this hypothesis, a positive error representing a loss, the negative error should be regarded as a gain: on the contrary, the magnitude of the loss should be evaluated by a function of the error whose value is always positive. Among the infinite number of functions that fulfill this condition, it seems natural to choose the simplest one, which is undoubtedly the square of the error, and thus we are led to the principle proposed above.

Template:Sc considered the question in a similar manner, but adopted as a measure of loss the error itself, taken positively. This assumption, if we do not deceive ourselves, is no less arbitrary than ours: should we, indeed, consider a double error as more or less regrettable than a simple error repeated twice, and should we, consequently, assign it a double or more than double importance? This is a question that is not clear, and on which mathematical arguments have no bearing; each must resolve it according to their preference. Nevertheless, it cannot be denied that Laplace's assumption deviates from the law of continuity and is therefore less suitable for analytical study; ours, on the other hand, is recommended by the generality and simplicity of its consequences.

7.

Let us define φ(x)x2dx=m2: we will call m the "mean error to be feared" or simply the "mean error" of the observation whose indefinite errors x have a relative probability of ϕx. We do not limit this designation to the immediate result of the observations, but rather extend it to any quantity that can be derived from them in any way. It is important not to confuse this mean error with the arithmetic mean of the errors, which is discussed in art. 5.

When comparing several systems of observations or several quantities resulting from observations that are not given the same precision, we will consider their relative "weight" to be inversely proportional to m2, and their "precision" to be inversely proportional to m. In order to represent the weights by numbers, we should take, as the unit, the weight of a certain arbitrarily chosen system of observations.

8.

If the errors of the observations have a constant part, subtracting it from each obtained result reduces the mean error, increases the weight and precision. Retaining the notation of art. 5, and letting m denote the mean error of the corrected observations, we have

Template:C

If, instead of the constant part k, another number l were subtracted from each observation, the square of the mean error would become

Template:C

9.

Let λ be a determined coefficient and let μ the value of the integral

Template:C

Then μ will be the probability that the error of a certain observation is less than λm in absolute value, and 1μ will be the probability that this error exceeds λm. If, for μ=12, λm has the value ρ, it will be equally likely for the error to be smaller or larger than ρ: thus, ρ can be called the probable error. The relationship between λ and μ depends on the nature of the function φ, which is unknown in most cases. However,i t is interesting to study this relationship in some particular cases.

I. If the extreme limits of the possible errors are +a and a, and if, between these limits, all errors are equally probable, the function φ(x) will be constant between these same limits, and, consequently, equal to 12a. Hence, we have m=a13, and μ=λ13, so long as λ is less than or equal to 3; finally ρ=m34=0,8660254m, and the probability that the error does not exceed the mean error is 13=0,5773503.

II. If as before a and +a are the limits of possible errors, and if we assume that the probability of these errors decreases from the error 0 onwards like the terms of an arithmetic progression, then we will have

Template:C

From this, we deduce that m=a16 and μ=λ2316λ2, as long as λ is between 0 and 6; λ=666μ as long as μ is between 0 and 1; and finally,

Template:C

In this case, the probability that the error remains below the mean error is

Template:C

III. If we assume the function φ(x) to be proportional to ex2h2, then it must be equal to

Template:C

where π denotes the semiperimeter of a circle of radius 1, from which we deduce

Template:C

(see Disquisitiones generales circa seriem infinitam, art. 28). If we let Θz denote the value of the integral

Template:C

then we have

Template:C

The following table gives some values of this quantity:

Template:C

10.

Although the relationship between λ and μ depends on the nature of the function φ, some general results can be established that apply to all cases where this function does not increase with the absolute value of the variable x; then we have the following theorems:

λ will not exceed μ3 whenever μ is less than 23;
λ will not exceed 231μ whenever μ exceeds 23;

When μ=23, the two limits coincide and λ cannot exceed 43.

To prove this remarkable theorem, let y be the value of the integral x+xφ(z)dz. Then y will be the probability that an error is between x and +x. Let us set

Template:C

then we have ψ(0)=0, and

Template:C

and by hypothesis ψ(y) is always increasing between y=0 and y=1, or at least is not decreasing, or equivalently ψ(y) is always positive, or at least not negative. Now we have

Template:C

thus,

Template:C

Therefore, yψ(y)ψ(y) always has positive value, or at least this expression is never negative, and therefore

Template:C

will always be positive and less than unity. Let f be the value of this difference for y=μ; since ψ(μ)=λm, we have

Template:C

This being prepared, let's consider the function

Template:C

which we set =F(y), and also d.F(y)=F(y)dy. Then it is clear that

Template:C

Since ψ(y) is continually increasing with y (or at least does not decrease, which should always be understood), and at the same time F'(y) is constant, the difference

Template:C

will be positive for all values of y greater than μ, and negative for all values of y smaller than μ. It follows that the difference ψ(y)F(y) is always positive, and consequently, ψ(y) will certainly be greater than F(y) in absolute value, as long as the function F(y) is positive, i.e. between y=μf and y=1. The value of the integral

Template:C

will therefore be less than that of the integral

Template:C

and a fortiori less than

Template:C

i.e., less than m2. Now the value of the first of these integrals is found to be

Template:C

and therefore λ2 is less than 3μ2(1f)2(1μf)3, with f being a number between 0 and 1. If we consider f as a variable, then this fraction, whose differential is

Template:C

will be continually decreasing as f increases from 0 to 1 so long as μ is less than 23, and therefore its maximum value will be found when f=0 and will be =3μ2, so that in this case, the coefficient λ will certainly be less, or at least not greater than μ3. Q.E.P. On the other hand, when μ is greater than 23, the maximum value of the function will be found when 23μ+μf=0, i.e. for f=32μ, and this maximum value will be =49(1μ), so in this case, the coefficient λ will not be greater than 231μ. Q.E.S.

Thus e.g. for μ=12 it is certain that λ will not exceed 34, which means that the probable error cannot exceed 0,8660254m, to which it was found to be equal in the first example in art. 9. Furthermore, it is easily concluded from our theorem that μ is not less than λ13 when λ is less than 43, and on the other hand, it is not less than 149λ2 when λ is greater than 43.

11.

Since several of the problems discussed below involve the integral x4φ(x)dx, it will be worthwhile for us to evaluate it in some special cases. Let us denote the value of the integral Template:C by n4. I. When φ(x)=12a for values of x between a and +a, we have n4=a45=95m4.

II. In the second case of art. 9, with x still between a and +a, we have n4=115a4=125m4.

III. In the third case, where Template:C we find, as explained in the commentary cited above, that n4=34h4=3m4.

It can also be demonstrated, with only the assumptions of the previous article, that the ratio n4m4 is never less than 95.

12.

Let x, x, x, etc. denote the errors made in observations of the same kind, and suppose that these errors are independent of each other. Let φ(x) be the relative probability of error x, and let y be a rational function of variables x, x, Template:Nobr Then the multiple integral

Template:Optional style|(I) φ(x)φ(x)φ(x)dxdxdx,

extended to all values of the variables x, x, Template:Nobr for which the value of y falls between the given limits 0 and η, represents the probability that the value of y is between 0 and η. This integral is evidently a function of η, whose differential we set =ψ(η)dη, so that the integral in question is equal to 0ηψ(η)dη, and therefore, ψ(η) represents the relative probability of an arbitrary value of y. Since x can be regarded as a function of the variables y, x, Template:Nobr, which we set Template:C the integral (I) will be Template:C where y takes values between y=0 and y=η, and the other variables take all values for which f(y,x,x,) is real. Hence we have Template:C the integration, where y is to be regarded as a constant, being extended to all values of the variables x, Template:Nobr for which f(y,x,x,) takes a real value.

13.

The previous integration would require knowledge of the function φ, which is unknown in most cases. Even if this function were known, the calculation would often exceed the capabilities of analysis. Therefore, it will be impossible to obtain the probability of each value of y; but it is different if one asks only for the average value of y, which will be given by the integral yψ(y)dy, extended to all possible values of y. And since it is evident that for all values which y cannot attain, either due to the nature of the function (e.g. for negative values, if y=xx+xx+xx+ etc.), or because of the limits imposed on x, x, Template:Nobr, one can assume that ψ(y)=0, it is clear that the integration can be extended to all real values of y from to +.

But the integral yψ(y)dy, taken between determinate limits η and η, is equal to the integral Template:C, taken from y=η to y=η and extended to all values of the variables x, Template:Nobr for which f is real. This integral is therefore equal to the integral Template:C in which y is expressed as a function of x, x, Template:Nobr, and the integration is extended to all values of the variables that leave y between η and η. Thus, the integral}} Template:C can be obtained from the integral Template:C where the integration is extended to all real values of x, x, x, that is, from x= to x=+, x= to Template:Nobr

If the function y reduces to a sum of terms of the form Template:C then the value of the integral Template:C extended to all values of y, or equivalently the average value of y, will be equal to a sum of terms of the form Template:C that is, the average value of y is equal to a sum of terms derived from those that make up y, by replacing xα, x'β, Template:Nobr with their average values. The proof of this important theorem could easily be derived from other considerations.

15.

Let us apply the theorem of the previous article to the case where Template:C and σ denotes the number of terms in the numerator.

We immediately find that the average value of y is equal to m2, the letter m having the same meaning as above. The true value of y may be lower or higher than its average, just as the true value of x2 may, in each case, be lower or higher than m2; but the probability that by chance, the value of y differs by a small amount from m2 will approach certainty as σ becomes larger. In order to clarify this, since it is not possible to determine this probability exactly, let us investigate the mean error to be feared when y=m2. It is clear from the principles of art. 6 that this error will be the square root of the average value of the function Template:C To find it, it suffices to observe that the average value of a term such as x4σ2 is equal to n4σ2 (n having the same meaning as in art. 11), and that the average value of a term such as 2x2x2σ2 is equal to 2m4σ2; therefore, the average value of this function will be Template:C

Since this last formula contains the quantity n, if we only want to get an idea of the precision of this determination, it will suffice to adopt a certain hypothesis about the function φ. E.g. if we take the third assumption of arts. 9 and 11, this error will be equal to m22σ. Alternatively, we can obtain an approximate value of n4 by means of the errors themselves, using the formula Template:C In general, it can be stated that a precision twice as great in this determination will require a quadruple number of errors, meaning that the weight of the determination is proportional to the number σ.

Similarly, if the errors of the observations contain a constant part, we will deduce from their arithmetic mean a value of the constant part, and this value will be approached as the number of errors increases. In this determination, the mean error to be feared will be represented by m2k2σ, where k denotes the constant part, and m denotes the mean error of the observations uncorrected for their constant error. It will be simply represented by mσ if m represents the mean error of the observations corrected for the constant part (see art 8).

16.

In the arts. 12-15, we assumed that the errors x, x, Template:Nobr belonged to the same type of observation, so that the probability of each of these errors was represented by the same function. However, it is clear that the general principles outlined in arts. 12-14 can be applied with equal ease in the more general case where the probabilities of the errors x, x, Template:Nobr, are represented by different functions φ(x), φ(x), φ(x) etc., i.e. when these errors belong to observations of varying precision or uncertainty. Let x denote the error of an observation with a mean error to be feared of m, and let x, Template:Nobr denote the errors of other observations with mean errors to be feared of m, Template:Nobr Then the average value of the sum x2+x2+x2+ etc. will be m2+m2+m2+ etc. Now, if it is also known that the quantities m, m, Template:Nobr are respectively proportional to the numbers 1, μ, Template:Nobr, then the average value of the expression Template:C will be =m2. However, if we adopt for m2 the value that this expression will take, by substituting the errors x, x, Template:Nobr, as chance offers them, then the mean error affecting this determination will become, just as in the preceding article, Template:C where n, Template:Nobr, have the same meaning with respect to the second and third observation, as n does with respect to the first; and if we can assume the numbers n, n, Template:Nobr, proportional to m, m, Template:Nobr, this mean error to be feared will be equal to Template:C

But this method of determining an approximate value for m is not the most advantageous. Consider the more general expression Template:C whose average value will also be m2, regardless of the coefficients α, Template:Nobr The mean error to be feared when substituting the value m2 for a value of y, as determined by the likelihoods of x, x, Template:Nobr, will, according to the principles above, be given by the formula Template:C To minimize this error, we must set Template:C These values cannot be evaluated until the exact ratios nm, Template:Nobr are known. In the absence of exact knowledge[1], it is safest to assume them equal to each other (see art. 11), in which case Template:C i.e. the coefficients α, Template:Nobr, should be assumed equal to the relative weights of the various observations, taking the weight of the one corresponding to the error x as the unit. With this assumption, let σ denote, as above, the number of proposed errors. Then the average value of the expression Template:C will be =m2, and when we take, for the true value of m2, the randomly determined value of this expression, the mean error to be feared will be Template:C and, finally, if we are allowed to assume that the quantities n, n, Template:Nobr, are proportional to m, m, Template:Nobr, this expression reduces to Template:C which is identical to what we found in the case where all observations were of the same type.

17.

When the value of a quantity, which depends on an unknown magnitude, is determined by an observation whose precision is not absolute, the result of this observation may provide an erroneous value for the unknown, but there is no room for discretion in this determination. But if several functions of the same unknown have been found by imperfect observations, we can obtain the value of the unknown either by any one of these observations, or by a combination of several observations, which can be carried out in infinitely many ways. The result will be subject, in all cases, to a possible error, and depending on the combination chosen, the mean error to be feared may be greater or smaller. The same applies if several observed quantities depend on multiple unknowns. Depending on whether the number of observations equals the number of unknowns, or is smaller or larger than this number, the problem will be determined, undetermined, or more than determined (at least in general), and in this third case, the observations can be combined in infinitely many ways to provide values for the unknowns. Among these combinations, the most advantageous ones must be chosen, i.e., those that provide values for which the mean error to be feared is as small as possible. This problem is certainly the most important one presented by the application of mathematics to natural philosophy.

In Theoria motus corporum coelestium we have shown how to find the most probable values of unknowns when the probability law of the observational errors is known, and since, in almost all cases, this law remains hypothetical by its nature, we have applied this theory to the highly plausible hypothesis that the probability of error x is proportional to eh2x2. Hence this method that I have followed, especially in astronomical calculations, and which most calculators now use under the name of Method of Least Squares.

Template:Sc later considered the question from another point of view, and showed that this principle is preferable to all others, regardless of the probability law of the errors, provided that the number of observations is very large. But when this number is limited, the question remains open; so that, if we reject our hypothetical law, the method of least squares would be preferable to others, for the sole reason that it leads to simpler calculations.

We therefore hope to please geometers by demonstrating in this Memoir that the method of least squares provides the most advantageous combination of observations, not only approximately, but also absolutely, regardless of the probability law of errors and regardless of the number of observations, provided that we adopt for the mean error, not Laplace's definition, but the one which we have given in arts. 5 and 6.

It is necessary to warn here that in the following investigations, only random errors reduced by their constant part will be considered. It is up to the observer to carefully eliminate the causes of constant errors. We reserve for another occasion the examination of the case where observations are affected by an unknown constant error, and we will address this issue in another Memoir.

18.

Template:Sc Let U be a given function of the unknowns V, V, Template:Nobr; we ask for the mean error M to be feared in determining the value of U when, instead of the true values of V, V, Template:Nobr, we take the values derived from independent observations; m, m, Template:Nobr, being the mean errors corresponding to these various observations.

Solution. Let e, e, Template:Nobr denote the errors of the observed values V, V, Template:Nobr; the resulting error for the value of the function U can be expressed by the linear function Template:C where λ, λ, Template:Nobr, represent the derivatives dUdV, dUdV, Template:Nobr, when V, V, Template:Nobr, are replaced by their true values.

This value of E is evident if we assume the observations to be accurate enough so that the squares and products of the errors are negligible. It follows that the average value of E is zero, since we assume that the errors of the observations have no constant part. Now the mean error M to be feared in the value of U will be the square root of the average value of E2, or equivalently M2 will be the average value of the sum Template:C but the average value of λ2e2 is λ2m2, that of λ2e2 is Template:Nobr, etc., and finally the average values of the products 2λλee are all zero. Hence we find that Template:C

It is good to add several remarks to this solution.

I. Since we neglect powers of errors higher than the first, we can, in our formula, take for λ, λ, Template:Nobr, the values of the differential coefficients Template:Nobr, derived from the observed values V, V', Template:Nobr Whenever U is a linear function, this substitution is rigorously exact.

II. If instead of mean errors, one prefers to introduce weights p, p, Template:Nobr for the respective observations, with the unit being arbitrary, and P being the weight of the value of U. Then we will have Template:C

III. Let T be another function of V, V', Template:Nobr and let Template:C The error in the determination of T, from the observed values V, V', Template:Nobr will be Template:C and the mean error to be feared in this determination will be Template:C It is obvious that the errors E and E will not be independent of each other, and the mean value of the product EE' will not be =0 like the mean value of ee, but instead it will be equal to Template:C

IV. The problem includes the case where the values of the quantities V, V', Template:Nobr, are not immediately given by observation, but are deduced from any combinations of direct observations. For this extension to be legitimate, the determinations of these quantities must be independent, i.e., they must be provided by different observations. If this condition of independence is not fulfilled, the formula giving the value of M would no longer be accurate. For example, if the same observation were used both in determining V and in determining V', the errors e and e would no longer be independent, and the mean value of the product ee would no longer be zero. If, in this case, the relationship between V and V and the results of the simple observations from which they derive is known, we can calculate the mean value of the product ee, as indicated in remark III, and consequently correct the formula which gives M.

19.

Let V, V, Template:Nobr, be functions of the unknowns x, y, Template:Nobr Let π be the number of these functions, and let ρ be the number of unknowns. Suppose that observations have given, immediately or indirectly, V=L, V=L, Template:Nobr, and that these determinations are absolutely independent of each other. If ρ is greater than π, then the determination of the unknowns is an indeterminate problem. If ρ is equal to π, then each of the unknowns x, y, Template:Nobr, can be reduced to a function of V, V, Template:Nobr so that the values of the former can be deduced from the observed values of the latter, and the previous article will allow us to calculate the relative accuracy of these various determinations. If ρ is less than π, then each unknown x, y, Template:Nobr, can be expressed in infinitely many ways as a function of V, V, Template:Nobr, and, in general, these values will be different; they should coincide if the observations were, contrary to our assumptions, rigorously accurate. It is clear, moreover, that the various combinations will provide results whose accuracy will generally be different.

Moreover, if, in the second and third cases, the quantities V, V, Template:Nobr, are such that πρ+1 of them, or more, can be regarded as functions of the others, the problem is more than determined relative to these latter functions and indeterminate relative to the unknowns x, y, Template:Nobr; and we could not even determine these latter unknowns, even if the functions V, V, Template:Nobr, were exactly known: but we exclude this case from our investigations.

If V, V, Template:Nobr, are not linear functions of the unknowns, we can always assign them this form, by replacing the primitive unknowns with their difference from their approximate values, which we assume known; the mean errors to be feared in the determinations Template:C being respectively denoted by m, m, Template:Nobr, and the weights of these determinations by p, p, Template:Nobr, so that Template:C We will assume that both the ratios of the mean errors and the weights are known, one of which will be arbitrarily chosen. Finally, if we set Template:C then things will proceed as if immediate observations, equally precise and with mean error mp, had given Template:C

20.

Template:Sc Let v, v, Template:Nobr, be the following linear functions of the unknowns x, y, Template:Nobr,

Template:Optional style|(1) {v=ax+by+cz++l,v=ax+by+cz++l,v=ax+by+cz++l,

Among all systems of coefficients ϰ, ϰ, Template:Nobr, that identically satisfy Template:C k being independent of x, y, Template:Nobr, find the one for which ϰ2+ϰ2+ϰ2+ obtains its minimum value.

Solution. — Let us set

Template:Optional style|(2) {av+av+av+=ξ,bv+bv+bv+=η,cv+cv+cv+=ζ,

ξ, η, ζ are linear functions of x, y, z, and we have

Template:Optional style|(3) {ξ=xΣa2+yΣab+zΣac++Σal,η=xΣab+yΣb2+zΣbc++Σbl,ζ=xΣac+yΣbc+zΣc2++Σcl,

where Σa2 denotes the sum a2+a2+a2+, and similarly for the other sums.

The number of quantities ξ, η, Template:Nobr, is equal to the number of unknowns x, y, Template:Nobr namely ρ. Thus, by elimination, one can obtain an equation of the following form,[2]

Template:C

which will be identically satisfied if we replace ξ, η, ζ with their values from (3). Consequently, if we set

Template:Optional style|(4) {a(αα)+b(αβ)+c(αγ)+=α,a(αα)+b(αβ)+c(αγ)+=α,a(αα)+b(αβ)+c(αγ)+=α,

then we will have identically

Template:Optional style|(5) αv+αv+αv+=xA.

This equation shows that among the different systems of coefficients ϰ, ϰ, Template:Nobr, we must consider the system

Template:C

Moreover, for any system, we will have identically

Template:C

and this equation, being identical, leads to the following:

Template:C

Adding these equations after multiplying them, respectively, by (αα), (αβ), Template:Nobr, we will have, by virtue of the system (4), Template:C which is the same as Template:C thus, the sum Template:C will have its minimum value when ϰ=α, ϰ=α, Template:Nobr Q.E.I.

Moreover, this minimum value will be obtained as follows. Equation (5) shows that we have Template:C Let's multiply these equations, respectively, by (αα), (αβ), Template:Nobr, and add them; considering the relations (4), we find Template:C

21.

When the observations have provided approximate equations v=0, v=0, Template:Nobr it will be necessary, to determine the unknown x, to choose a combination of the form Template:C such that the unknown x acquires a coefficient equal to 1, and that the other unknowns are eliminated.

According to art. 18, the weight of this determination will be given by Template:C According to the previous article, the most suitable determination will be obtained by taking ϰ=α, ϰ=α, Template:Nobr Then x will have the value A, and it is clear the same value would be obtained (without knowing the multipliers α, α, Template:Nobr), by performing elimination on the equations ξ=0, η=0, Template:Nobr The weight of this determination will be given =1(αα), and the mean error to be feared will be Template:C

A similar approach would lead to the most suitable values of the other unknowns y, Template:Nobr, which would be those obtained by performing eliminating on the equations ξ=0, η=0, ζ=0, etc.

If we denote the sum v2+v2+v2+, or equivalently Template:C by Ω, then it is clear that 2ξ, 2η, 2ζ, etc. will be the partial differential quotients of the function Ω, i.e.

Template:C

Therefore, the values of the unknowns that are deduced from the most suitable combination, and which we can call the most plausible values, are precisely those that minimize Ω. Now VL represents the difference between the observed value and the computed value. Thus, the most plausible values of the unknowns are those that minimize the sum of the squares of the differences between the calculated and observed values of the quantities V, V, Template:Nobr, these squares being respectively multiplied by the weight of the observations. I had established this principle a long time ago through other considerations, in Theoria Motus Corporum Coelestium.

If one wants to assign the relative precision of each determination, it is necessary to deduce the values of x, y, Template:Nobr from the equations (3), which gives them in the following form:

Template:Optional style|(7) {x=A+(αα)ξ+(αβ)η+(αγ)ζ+,y=B+(βα)ξ+(ββ)η+(βγ)ζ+,z=C+(γα)ξ+(γβ)η+(γγ)ζ+.

Accordingly, the most plausible values of the unknowns x, y, Template:Nobr, will be A, B, Template:Nobr The weights of these determinations will be 1(αα), 1(ββ), Template:Nobr and the mean errors to be feared will be

for x,mp(αα)=mp(αα),,
for y,mp(ββ)=mp(ββ),,
for z,mp(γγ)=mp(γγ),,

in agreement with the results obtained in Theoria Motus Corporum Coelestium.

22.

The case where there is only one unknown is the most frequent and simplest of all. In this case we have V=x, V=x, Template:Nobr We will then have a=p, a=p, Template:Nobr l=Lp, l=Lp, Template:Nobr and consequently, Template:C Hence Template:C

Therefore, if by several observations that do not have the same precision and whose respective weights are p, p, Template:Nobr, we have found, for the same quantity, a first value L, a second L, a third Template:Nobr, then the most plausible value will be Template:C and the weight of this determination will be =p+p+p+. If all observations are equally plausible, then the most probable value will be Template:C i.e. the arithmetic mean of the observed values; taking the weight of an individual observation as the unit, the weight of the average will be π.

Part Two

23.

A number of investigations still remain to be discussed, through which the preceding theory will be clarified and extended.

Let us first investigate whether the elimination used to express the variables x, y, Template:Nobr, in terms of ξ, η, Template:Nobr, is always possible. Since the number of equations is equal to the number of unknowns, we know that this elimination will be possible if ξ, η, Template:Nobr are independent of each other; otherwise, it is impossible.

Suppose, for a moment, that ξ, η, Template:Nobr are not independent, but rather there exists between these quantities an identical equation Template:C We will then have Template:C

Let us set

Template:Optional style|(1) {aF+bG+cH+=θ,aF+bG+cH+=θ,aF+bG+cH+=θ,

from which it follows that Template:C Multiplying the equations (1) resp. by θ, θ, Template:Nobr and adding, we obtain Template:C and this equation leads to θ=0, θ=0, θ=0, etc. From this we conclude, first of all, K=0. Secondly, the equations (1) show that the functions v, v, Template:Nobr, are such that their values do not change when the variables x, y, Template:Nobr, increase or decrease proportionally to F, G, Template:Nobr respectively. It is clear that the same holds for the functions V, V', Template:Nobr: but this can only happen in the case where it would be impossible to determine x, y, Template:Nobr etc. using the values of V, V', Template:Nobr even if these were exactly known; but then the problem would be indeterminate by its nature, and we will exclude this case from our investigations.

24.

If β, β, Template:Nobr denote multipliers playing the same role relative to the unknown y, as the multipliers α, α, Template:Nobr relative to the unknown x, i.e. so that we have Template:C then we will identically have Template:C Let γ, γ, Template:Nobr be the analogous multipliers relative to the variable z, so that we have: Template:C and consequently, Template:C In the same way as we found in art. 20 that Template:C we will find here Template:C and so on.

We will also have, as in art. 20

Template:C

If we multiply the values α, α, Template:Nobr (art. 20. (4)), respectively, by β, β, Template:Nobr, and add; we obtain

Template:C

If we multiply β, β, Template:Nobr, respectively, by α, α, Template:Nobr, and add, we will find

Template:C

In the same manner, we find

Template:C

25.

Let λ, λ, Template:Nobr denote the values taken by the functions v, v, Template:Nobr, when x, y, Template:Nobr are replaced by their most plausible values, A, B, Template:Nobr, i.e. Template:C

If we set Template:C so that M is the value of the function Ω corresponding to the most plausible values of the variables, and therefore, as was shown in art. 20, the minimum value of Ω. Then the value of aλ+aλ+aλ+ will be ξ, corresponding to x=A, y=B, Template:Nobr\end{aligned},</math> and this value is zero, according to the way A, B, Template:Nobr have been obtained. Thus, we have Template:C and similarly we would obtain Template:C and Template:C Finally, multiplying the values of λ, λ, Template:Nobr respectively by λ, λ, λ, and adding, we get lλ+lλ+lλ+=λ2+λ2+λ2+, or Template:C

26.

Replacing x, y, Template:Nobr, with the expressions (7) from art. 21 in the equation v=ax+by+cz++l, we find, through the same reductions as before, Template:C Multiplying either these equations or the equations (1) of art. 20, by λ, λ, Template:Nobr, and then adding, we obtain the identity Template:C

27.

The function Ω can take several forms, which are worth developing.

Let us square the equations (1) art. 20, and add them. Then we find Template:C this is the first form.

Next let us multiply the same equations by v, v, Template:Nobr respectively, and add. Then we obtain Ω=ξx+ηy+ζz++lv+lv+lv+; and replacing v, v, Template:Nobr, with the values indicated in the previous article, we find that Ω=ξx+ηy+ζz+AξBηCζ+M, or Ω=ξ(xA)+η(yB)+ζ(zC)++M: this is the second form.

Finally, replacing, in this second form, xA, yB, Template:Nobr by the expressions (7) art. 21, we obtain the 'third form': Template:C

We can also give a fourth form which results automatically from the third form and the formulas of the previous article: Template:C From this last form we clearly see that M is the minimum value of Ω.

28.

Let e, e, Template:Nobr, be the errors made in the observations that gave V=L, V'=L, Template:Nobr Then the true values of the functions V, V', Template:Nobr, will be Le, L'e, Template:Nobr respectively, and the true values of v, v, Template:Nobr, will be ep, ep, ep, etc. respectively. therefore, the true value of x will be Template:C and the error made in the most suitable determination of the unknown e, which we will denote by Ex, will be Template:C Similarly, the error made in the most suitable determination of the value of y will be Template:C The average value of the square (Ex)2 will be Template:C The average value of (Ey)2 will similarly be =m2p(ββ), as shown above. We can also determine the average value of the product Ex.Ey, which will be Template:C These results can be stated more briefly as follows:

The average values of the squares (Ex)2, Template:Nobr, are respectively equal to the products of 12m2p with the second-order partial differential quotients Template:C and the average value of a product such as Ex.Ey is the product of 12m2p with d2Ωdξdη, where Ω is regarded as a function of ξ, η, Template:Nobr}

29.

Let t be a given linear function of the quantities x, y, Template:Nobr, i.e. Template:C the value of t deduced from the most plausible values of x, y, Template:Nobr, will then be =fA+gB+hC++k, and we denote this by K. The error thus committed will be Template:C which we denote by Et. The average value of this error will obviously be zero, meaning the error will not contain a constant part, but the average value of (Et)2, i.e., the sum Template:C will, according to the preceding article, be equal to the product of m2p with the sum Template:C i.e., the product of m2p with the value produced by the function ΩM when we substitute ξ=f,η=g,ζ=h,.

If we let ω denote this value of ΩM, then the mean error to be feared when we take t=K will be mp and the weight of this determination will be 1ω.

Since we have identically Template:C ω will be equal to the value of the expression (xA)f+(yB)g+(zC)h+ or the value produced by (t=K) when we substitute for x, y, Template:Nobr the values corresponding to ξ=f, η=g, Template:Nobr.

Finally, observing that t, expressed as a function of the quantities ξ, η, Template:Nobr, will have K as its constant part, if we suppose that Template:C then we will have Template:C

30.

We have seen that the function Ω attains its absolute minimum M, when we substitute x=A, y=B, Template:Nobr or, equivalently, ξ=0, η=0, Template:Nobr If we assign another value to one of the unknowns, e.g. x=A+Δ, while the other unknowns remain variable, Ω may acquire a relative minimum value, which can be obtained from the equations Template:C Therefore, we must have η=0, Template:Nobr and since Template:C we have Template:C Likewise, we have Template:C and the relative minimum value of Ω will be Template:C Reciprocally, we conclude that if Ω is not to exceed M+μ2, then the value of x must necessarily be between the limits Aμ(αα) and A+μ(αα). It is important to note that μ(αα) becomes equal to the mean error to be feared in the most plausible value of x, if we set μ=mp; i.e., if μ is the mean error of observations whose weights are =1.

More generally, let us find the smallest value of the function Ω that can correspond to a given value of t, where t denotes, as in the previous article, a linear expression fx+gy+hz++k whose most plausible value is K. Let us denote by the prescribed value of t by K+κ. According to the theory of maxima and minima, the solution to the problem will be given by the equations Template:C or ξ=θf, η=θg, ζ=θh, etc., where θ denotes an as yet undetermined multiplier. If, as in the previous article, we identically set, Template:C then we will have Template:C or Template:C where ω has the same meaning as in the previous article.

Since ΩM is a homogeneous function of the second degree with respect to the variables ξ, η, ζ, etc., its value when ξ=θf, η=θg, ζ=θh, etc. will evidently be =θ2ω, and thus the minimum value of Ω, when t=K+κ, will be =M+θ2ω=M+κ2ω. Reciprocally, if Ω must remain less than a given value M+μ2, the value of t will necessarily be between the limits Kμω, K+μω, and μω will be the mean error to be feared in the most plausible value of t, if μ represents the mean error of observations whose weights are =1.

31.

When the number of unknowns x, y, Template:Nobr is quite large, the determination of the numerical values of A, B, C, etc. by ordinary elimination is quite tedious. For this reason we have indicated, in Theoria Motus Corporum Coelestium art. 182, and later developed, in Disquisitione de elementis ellipticis Palladis (Comm. recent. Soc. Gotting Vol. I), a method that simplifies this work as much as possible. Namely, the function Ω must be reduced to the following form: Template:C where the divisors 𝔄0, 𝔅, , 𝔇, etc., are determined quantities; u0, u, u, etc., are linear functions of x, y, z, etc., such that the second u does not contain x, the third u contains neither x nor y, the fourth contains neither x, nor y, nor z, and so on, so that the last u(π1) contains only the last of the unknowns x, y, z, etc.; and finally, the coefficients of x, y, z, etc., in u0, u, u, etc., are respectively equal to 𝔄0, 𝔅, , etc. Then we set u0=0, u=0, u=0, u=0, etc. and we will easily obtain the values of x, y, Template:Nobr by solving these equations, starting with the last one. I do not believe it necessary to repeat the algorithm that leads to the transformation of the function Ω.

However, the elimination required to find the weights of these determinations requires even longer calculations. We have shown in the Theoria Motus Corporum Coelestium that the weight of the last unknown, (which appears by itself in u(π1)), is equal to the last term in the series of divisors 𝔄0, 𝔅, , etc. This is easily found; hence, several calculators, wanting to avoid cumbersome elimination, have had the idea, in the absence of another method, to repeat the indicated transformation by successively considering each unknown as the last one. Therefore, I hope that geometers will appreciate my indication of a new method for calculating the weights of determinations, which seems to leave nothing more to be desired on this point.

Setting

Template:Optional style|(1) {u0=𝔄0x+𝔅0y+0z++𝔏0,u=𝔅y+z++𝔏,u=z++𝔏,

we have identically Template:C and from this we deduce:

Template:Optional style|(2) {ξ=u0,η=𝔅0𝔄0u0+u,ζ=0𝔄0u0+𝔅u+u,

The values of u0, u, Template:Nobr deduced from these equations will be presented in the following form:

Template:Optional style|(3) {u0=ξ,u=Aξ+η,u=Aξ+Bη+ζ,

By taking the complete differential of the equation Template:C we obtain Template:C and thus Template:C This expression must be equivaleny to the one obtained from the equations (3), Template:C and therefore we have

Template:Optional style|(4) {x=u0𝔄0+Au𝔅+Au++A,y=u𝔅+Bu++B,z=u++C,

By substituting in these expressions the values of u0, u, and Template:Nobr obtained from the equations (3), we will have performed the elimination. For the determination of the weights, we have

Template:Optional style|(5) {(αα)=1𝔄0+A2𝔅+A2+A2𝔇+,(ββ)=1𝔅+B2+B2𝔇+,(γγ)=1+C2𝔇+.

The simplicity of these formulas leaves nothing to be desired. Equally simple formulas could be found to express the other coefficients (αβ), (αγ), and Template:Nobr; however, as their use is less frequent, we will refrain from presenting them.

33.

The importance of the subject has prompted us to prepare everything for the calculation and to form explicit expressions for the coefficients A', A', Template:NobrB', Template:Nobr This calculation can be approached in two ways. The first involves substituting the values of u0, u, and so forth, deduced from the equations (3) into the equations (2), and the second involves substituting the values ξ, η, ζ, from the equations (2) into the equations (3). The first method leads to the following formulas: Template:C These formulas will determine A', A', and so on.

We will then have, Template:C which will determine B', and so forth; then Template:C which will determine C' etc., and so on.

The second method yields the following system: Template:C from which we deduce A', Template:C from which we deduce B' and A', Template:C from which we deduce C', B', A', and so on.

Both systems of formulas offer nearly equal advantages when seeking the weights of the determinations of all unknowns x, y, and so forth; however, if only one of the quantities (αα), (ββ), and so forth is required, the first system is much preferable.

Moreover, the combination of equations (1) and (4) yields the same formulas, and provides, in addition, a second way to obtain the most plausible values A, B, and so forth, which are Template:C The other calculation is identical to the ordinary calculation in which it is assumed u0=0, u=0, Template:Nobr

34.

The results obtained in art. 32 are only particular cases of a more general theorem which can be stated as follows:

Template:Sc If t represents the following linear function of the unknowns x, y, Template:Nobr, Template:C whose expression in terms of the variables u0, u, Template:Nobr, is Template:C then K will be the most plausible value of t, and the weight of this determination will be Template:C

Proof. The first part of the theorem is obvious, since the most plausible value of t must correspond to the values u0=0, u=0, Template:Nobr

To demonstrate the second part, let's note that we have Template:C and consequently, when Template:C we have Template:C whatever the differentials dx, dy, Template:Nobr Hence, assuming always, ξ=f,η=g,ζ=h,, we obtain Template:C Now it is easily seen that if the differentials dx, dy, Template:Nobr are independent of each other, so will be du0, du, Template:Nobr, therefore, we will have, Template:C Hence, the value of Ω corresponding to the same assumptions, will be Template:C which, by art. 29, demonstrates the truth of our theorem.

Moreover, if we wish to perform the transformation of the function t without resorting to formulas (4) of art. 32, we immediately have the relations Template:C which will allow us to determine k0, k, Template:Nobr, and we will finally have Template:C

35.

We will particularly address the following problem, both because of its practical utility and the simplicity of the solution:

Find the changes that the most plausible values of the unknowns undergo by adding a new equation, and assign the weights of these new determinations.

Let us keep the previous notations. The primitive equations, reduced to have a weight of unity, will be v=0,v=0,v=0,; we will have Ω=v2+v2+v2+, and ξ, η, Template:Nobr, will be the partial derivatives Template:C Finally, by elimination, we will have

Template:Optional style|(1) {x=A+(αα)ξ+(αβ)η+(αγ)ζ+,y=B+(αβ)ξ+(ββ)η+(βγ)ζ+,z=C+(αγ)ξ+(βγ)η+(γγ)ζ+,

Now suppose we have a new approximate equation v=0 (which we assume to have a weight equal to unity), and we seek the changes undergone by the most plausible values of A, B, Template:Nobr, and of the coefficients (αα), Template:Nobr.

Let us set Template:C and let Template:C be the result of the elimination. Finally, let Template:C which, taking into account the equations (1), becomes Template:C and let Template:C

It is clear that K will be the most plausible value of the function v, as resulting from the primitive equations, without considering the value 0, provided by the new observation, and 1ω will be the weight of this determination.

Now we have Template:C and consequently, Template:C or Template:C Furthermore, Template:C From this, we deduce, Template:C which will be the most plausible value of x, deduced from all observations.

We will also have Template:C thus Template:C will be the weight of this determination.

Similarly, for the most plausible value of y, deduced from all observations, we find Template:C the weight of this determination will be Template:C and so on. Q.E.I.

Let us add some remarks.

I. After substituting the new values A, B, Template:Nobr, the function v will obtain the most plausible value Template:C and since we have, identically, Template:C the weight of this determination, according to art. 29, will be Template:C These results could be deduced immediately from the rules explained at the end of art. 21. The original equations had, indeed, provided the determination v=K, whose weight was 1ω. A new observation gives another determination v=0, independent of the first, whose weight is =1, and their combination produces the determination v=K1+ω with a weight of 1ω+1.

II. It follows from the above that, for x=A, y=B, Template:Nobr we must have ξ=0, η=0, Template:Nobr and consequently, Template:C Furthermore, since Template:C we must have Template:C and Template:C

III. Comparing these results with those of the art. 30, we see that here the function Ω has the smallest value it can obtain when subjected to the condition v=K1+ω.

36.

We will give here the solution to the following problem, which is analogous to the previous one, but we will refrain from indicating the demonstration, which can be easily found, as in the previous article.

Find the changes in the most plausible values of the unknowns and the weights of the new determinations when changing the weight of one of the primitive observations.

Suppose that after completing the calculation, it is noticed that the weight which has been assigned to an observation is too strong or too weak, e.g. the first one which gave V=L, and that it would be more accurate to assign it the weight p, instead of the weight p. It is not necessary to then restart the calculation. Instead it is convenient to form the corrections using the following formulas.

The most plausible values of the unknowns will be corrected as follows: Template:C and the weights of these determinations will be found upon dividing unity by Template:C respectively.

This solution applies in the case where, after completing the calculation, it is necessary to completely reject one of the observations, since this amounts to making p=0; similarly, p= will be suitable for the case where the equation V=L, which in the calculation had been regarded as approximate, is in fact absolutely precise.

If, after completing the calculation, several new equations were to be added to those proposed, or if the weights assigned to several of them were incorrect, the calculation of the corrections becomes too complicated, and it is preferable to start over.

37.

In the arts. 15 and 16, we have given a method to approximate the accuracy of a system of observations; but this method assumes that the real errors encountered in a large number of observations are known exactly; however, this condition is rarely fulfilled, if ever.

If the quantities for which the observation provides approximate values depend on one or more unknowns, according to a given law, then the method of least squares allows us to find the most plausible values of these unknowns. If we then calculate the corresponding values of the observed quantities, they can be regarded as differing little from the true values, so that their differences with the observed values will represent the errors committed, with a certainty that will increase with the number observations. This is the procedure followed in practice by calculators, who have attempted, in complicated cases, to retrospectively evaluate the precision of the observations. Although sufficient in many cases, this method is theoretically inaccurate and can sometimes lead to serious errors; therefore, it is very important to treat the issue with more care.

In the following discussion, we retain the notation used in art. 19. The method in question consists of considering A, B, Template:Nobr, as the true values of the unknowns x, y, Template:Nobr, and λ, λ, Template:Nobr, as those of the functions v, v, Template:Nobr If all observations have equal precision and their common weight p=p=p= is taken to be unity, these same quantities, changed in sign, represent, under this assumption, the errors of the observations. Consequently, according to art. 15, Template:C will be the mean error of the observations. If the observations do not have the same precision, then λ, λ, Template:Nobr, represent the errors of the observations, respectively multiplied by the square roots of the weights, and the rules of art. 16 lead to the same formula, Template:C which already expresses the mean error of these observations, when their weight is =1. However, it is clear that an exact calculation would require replacing λ, λ, Template:Nobr with the values of v, v, Template:Nobr, deduced from the true values of the unknowns x, y, Template:Nobr, and replacing the quantity M by the corresponding value of Ω. Although we cannot assign this latter value, we are nonetheless certain that it is greater than M (which is its minimum possible value), and it would only reach this limit in the infinitely unlikely case where the true values of the unknowns coincide with the most plausible ones. We can therefore affirm, in general, that the mean error calculated by ordinary practice is smaller than the exact mean error, and consequently, that too much precision is attributed to the observations. Now let us see what a rigorous theory yields.

38.

First of all, we need to determine how the quantity M depends on the true errors of the observations. As in art. 28, Let us denote these errors by e, e, Template:Nobr, and let us set, for simplicity, Template:C and Template:C Let Ax0, By0, Template:Nobr, be the true values of the unknowns x, y, Template:Nobr, for which ξ, η, Template:Nobr, are, respectively, ξ0, η0, Template:Nobr The corresponding values of v, v, Template:Nobr, will obviously be ε,ε,ε,; so that we will have Template:C Finally, Template:C will be the value of the function Ω, corresponding to the true values of the x, y, Template:Nobr Since we also have identically Template:C we will also have Template:C From this, it is clear that M is a homogeneous function of the second degree of the errors e, e, Template:Nobr; for various values of the errors this function may become greater or smaller. However, the extent of the errors remains unknown to us, so it is good to carefully examine the function M, and to first calculate its average value according to the elementary calculus of probability. We will obtain this average value by replacing the squares e2, Template:Nobr with m2, Template:Nobr, and omitting the terms in ee, Template:Nobr, whose average value is zero; or equivalently, by replacing each square ε2, ε2, Template:Nobr, by μ2, and neglecting εε, Template:Nobr. Accordingly, the term Ω0 will provide πμ2; the term x0ξ0 will produce Template:C each of the other terms will also give μ2, so that the total average value will be =(πρ)μ2, where π denotes the number of observations, and ρ denotes the number of unknowns. Due to errors offered by chance, the true value of M may be greater or smaller than this average value, but the difference decrease as the number of observations increases, so that Template:C can be regarded as an approximate value of μ. Consequently, the value of μ provided by the erroneous method we discussed in the previous article, must be increased by the ratio of πρ to π.

39.

To clearly understand the extent to which it is permissible to consider the value of M provided by the observations as equal to the exact value, we must seek the mean error to be feared when Mπρ=μ2. This mean error is the square root of the average value of the quantity Template:C which we will write as: Template:C and since the average value of the second term is evidently zero, the question reduces to finding the average value of the function Template:C If we denote this average value by N, then the mean error we seek will be N(πρ)2μ4.

Expanding the function Ψ, we see that it is a homogeneous function of the errors e, e, Template:Nobr, or equivalently, of the quantities ε, ε, Template:Nobr; therefore, we will find the average value by:

1. Replacing the fourth powers e4, e4, Template:Nobr, by their average values;

2. Replacing the products e2e2, Template:Nobr, by their average values, that is, by m2m2, Template:Nobr;

3. Neglecting products such as e3e, Template:Nobr. We will assume (see art. 16) that the average values of e4, e4, Template:Nobr, are proportional to m4, m4, Template:Nobr, so that the ratios of one to another are ν4μ4, where ν4 denotes the average value of the fourth powers of the errors for observations whose weight is =1. Thus the previous rules could also be expressed as follows: Replace each fourth power ε4, ε4, Template:Nobr, by ν4; each product ε2ε2, Template:Nobr, by μ4, and neglect all terms such as ε3ε or ε2εε, εεεε.

These principles being understood, it is easy to see that:

I. The average value of Ω02 is Template:C

II. The average value of the product ε2x0ξ0 is Template:C because Template:C Similarly, the average value of ε2x0ξ0 is Template:C the average value of ε2x0ξ0 is Template:C and so on. Thus the average value of the product Template:C will be Template:C

The products Ω0y0η0 or Template:Nobr, will have the same average value. Thus the product Template:C will have an average value of Template:C

III. To shorten the following developments, we will adopt the following notation. We give the character Σ a more extended meaning than we have done so far, by making it designate the sum of similar but not identical terms arising from all permutations of the observations. According to this notation, we will have Template:C Calculating the average value of x02ξ02 term by ter, we first have, for the average value of the product α2ε2ξ02, Template:C Similarly, the average value of the product α2ε2ξ02 is Template:C and so on. Therefore, the average value of the product Template:C is Template:C Now the average value of ααεεξ02 is Template:C The average value of ααεεξ02 is Template:C and so on. Hence, we easily conclude that the average value of the product Template:C is Template:C Thus, for the average value of the product x02ξ02, we have Template:C

IV. Similarly, for the average value of the product x0y0ξ0η0, we find Template:C Now, we have Template:C so this average value will be Template:C

V. By a similar calculation, we find that the average value of x0z0ξ0ζ0 is Template:C and so on. Adding up, we obtain the average value of the product Template:C this value is Template:C

VI. Similarly, we find that Template:C is the average value of the product Template:C and Template:C is the average value of the product Template:C and so on.

Hence by addition we find the average value of the square Template:C which is Template:C

VII. Finally, from all these preliminaries, we conclude that Template:C Therefore, the mean error to be feared when Template:C will be Template:C

40.

The quantity Template:C which occurs in the expression above, generally cannot be reduced to a simpler form. However, we can assign two limits between which its value must necessarily lie. First, It is easily deduced from the previous relations that Template:C from which we conclude that Template:C is a positive quantity smaller than unity, or at least not larger. The same will be true for the quantity Template:C which is equal to the sum Template:C Similarly, Template:C will be smaller than unity; and so on. Therefore, Template:C must be smaller than π. Second, we have Template:C since Template:C from which it is easily deduced that Template:C is greater, or at least not smaller, than ρ2π. Therefore, the term Template:C must necessarily lie between the limits Template:C or, between the broader limits Template:C Thus, the square of the mean error to be feared for the value Template:C lies between the limits Template:C so that a degree of precision as great as desired can be achieved, provided the number of observations is sufficiently large.

It is very remarkable that in hypothesis III of art. 9, on which we had formerly relied to establish the theory of least squares, the second term of the square of the average error completely disappears (since ν43μ4=0); and because, to find the approximate value μ of the average error of the observations, it is always necessary to treat the sum Template:C as if it were equal to the sum of the squares of πρ random errors, it follows that, in this hypothesis, the precision of this determination becomes equal to that which we found, in art. 15, for the determination from πρ true errors.

  1. The exact determination of μ, Template:Nobr, is conceivable only in the case where, by the nature of the matter, the errors x, x, Template:Nobr proportional to 1, μ, Template:Nobr, are considered equally probable, or rather in the case where Template:C
  2. We will later explain the reasoning that led us to denote the coefficients of this formula by the notation (αα), Template:Nobr.