Translation:Theoria combinationis observationum erroribus minimis obnoxiae
Template:Translation header Template:TOC limit
Part One
1.
No matter how careful one is with observations concerning the measurement of physical quantities, they are inevitably subject to errors of varying degrees. These errors, in most cases, are not simple but arise from several distinct sources that it is best to distinguish into two classes.
Some causes of errors depend, for each observation, on variable circumstances independent of the result obtained: the errors arising from these are called "irregular" or "random," and like the circumstances that produce them, their value is not amenable to calculation. Such are the errors that arise from the imperfection of our senses and all those due to irregular external causes, e.g. vibrations of the air that blur our vision. Some of the errors due to the inevitable imperfection of even the best instruments, e.g. the roughness of the inner part of a level, its lack of absolute rigidity, etc., belong to this same category.
On the other hand, there are other causes that produce an identical error in all observations of the same kind, or one whose magnitude depends only on circumstances that can be viewed as essentially connected to the observation. We will call errors of this category "constant" or "regular" errors.
Moreover, one can see that this distinction is to a certain extent relative, and has a broader or narrower sense depending on the meaning one attaches to the idea of observations of the same nature. E.g. if one indefinitely repeats the measurement of the same angle, the errors arising from imperfect division of the instrument belongs to the class of constant errors. If, on the other hand, one successively measures several different angles, the errors due to imperfect division will be considered random until a table of errors relative to each division has been formed.
2.
We exclude the consideration of regular errors from our discussion. It is up to the observer to carefully investigate the causes that can produce a constant error, to eliminate them if possible, or at least assess their effect in order to correct it for each observation, which will then give the same result as if the constant cause had not existed. It is quite different for irregular errors: by their nature, they resist any calculation, and they must be tolerated in observations. However, by skillfully combining results, their influence can be minimized as much as possible. The following investigation is devoted to this most important topic.
3.
Errors arising from a simple and determinate cause in observations of the same kind are confined within certain limits that could undoubtedly be assigned if the nature of this cause were perfectly known. In most cases, all errors between these extreme limits must be considered possible. A thorough knowledge of each cause would reveal whether all these errors have equal or unequal likelihood, and in the latter case, what the relative probability of each of them is. The same remark applies to the total error resulting from the combination of several simple errors. This error will also be confined between two limits, one being the sum of the upper limits, the other the sum of the lower limits corresponding to the simple errors. All errors between these limits will be possible, and each can result, in an infinite number of ways, from suitable values attributed to the partial errors. Nevertheless, it is possible to assign a larger or smaller likelihood for each result, from which the law of relative probability can be derived, provided that the laws of each of the simple errors are assumed to be known, and ignoring the analytical difficulties involved in collecting all of the combinations.
Of course, certain sources of error produce errors that cannot vary according to a continuous law, but are instead capable of a finite number of values, such as errors arising from the imperfect division of instruments (if indeed one wants to classify them among random errors), because the number of divisions in a given instrument is essentially finite. Nevertheless, if it is assumed that not all sources of error are of this type, then it is clear that the complex of all possible total errors will form a series subject to the law of continuity, or, at least, several distinct series, if it so happens that, upon arranging all possible values of the discontinuous errors in order of magnitude, the difference between a pair of consecutive terms is greater than the difference between the extreme limits of the errors subject to the law of continuity. In practice, such a case will almost never occur, unless the the instrument is subject to gross defects.
4.
Let denote the relative likelihood of an error this means, due to the continuity of the errors, that is the probability that the error lies between the limits and In practice it is hardly possible, or perhaps impossible, to assign a form to the function a priori. Nevertheless, several general characteristics that it must necessarily present can be established: is obviously a discontinuous function; it vanishes for all values of not between the extreme errors. For any value between these limits, the function is positive (excluding the case indicated at the end of the previous article); in most cases, errors of opposite signs will be equally possible, and thus we will have: Finally, since small errors are more easily made than large ones, will generally have a maximum when and will continually decrease as increases.
In general, the integral expresses the probability that the unknown error falls between the limits and It follows that the value of this integral taken between the extreme limits of the possible errors will always be And since is zero for values not between these limits, it is clear that in all cases
5.
Let us consider the integral and denote its value by If the sources of error are such that there is no reason for two equal errors of opposite signs to have unequal likelihood, we will have and consequently, We conclude that if does not vanish and has e.g. a positive value, then there necessarily exists an error source that produces only positive errors or, at least, produces them more easily than negative errors. This quantity which is the average of all possible errors, or the average value of can conveniently be referred to as the "constant part of the error". Moreover, it is easily proven that the constant part of the total error is the sum of the constant parts of the simple errors of which it is composed.
If the quantity is assumed to be known and subtracted from the result of each observation, then, denoting the error of the corrected observation by and the corresponding probability by we will have and consequently, i.e. the errors of the corrected observations will have no constant part, which is clear in and of itself.
6.
The value of the integral which is the average value of reveals the presence or absence of a constant error, as well as the value of this error. Similarly, the integral which is the average value of seems very suitable for defining and measuring, in a general manner, the uncertainty of a system of observations. Therefore, between two systems of observations of unequal precision, the one giving a smaller value to the integral should be considered preferable. If it is argued that this convention is arbitrary and seemingly unnecessary, then we readily agree. The question at hand is inherently vague and can only be delimited by a somewhat arbitrary principle. Determining a quantity through observation can be likened, somewhat accurately, to a game in which there is a loss to be feared and no gain to be expected; each error being likened to a loss incurred, the relative apprehension about such a game should be expressed by the probable loss, i.e., by the sum of the products of the various possible losses by their respective probabilities. But what loss should be likened to a specific error? This is not clear in itself; its determination depends partly on our whim. It is evident, first of all, that the loss should not be regarded as proportional to the error committed; for, in this hypothesis, a positive error representing a loss, the negative error should be regarded as a gain: on the contrary, the magnitude of the loss should be evaluated by a function of the error whose value is always positive. Among the infinite number of functions that fulfill this condition, it seems natural to choose the simplest one, which is undoubtedly the square of the error, and thus we are led to the principle proposed above.
Template:Sc considered the question in a similar manner, but adopted as a measure of loss the error itself, taken positively. This assumption, if we do not deceive ourselves, is no less arbitrary than ours: should we, indeed, consider a double error as more or less regrettable than a simple error repeated twice, and should we, consequently, assign it a double or more than double importance? This is a question that is not clear, and on which mathematical arguments have no bearing; each must resolve it according to their preference. Nevertheless, it cannot be denied that Laplace's assumption deviates from the law of continuity and is therefore less suitable for analytical study; ours, on the other hand, is recommended by the generality and simplicity of its consequences.
7.
Let us define we will call the "mean error to be feared" or simply the "mean error" of the observation whose indefinite errors have a relative probability of We do not limit this designation to the immediate result of the observations, but rather extend it to any quantity that can be derived from them in any way. It is important not to confuse this mean error with the arithmetic mean of the errors, which is discussed in art. 5.
When comparing several systems of observations or several quantities resulting from observations that are not given the same precision, we will consider their relative "weight" to be inversely proportional to and their "precision" to be inversely proportional to In order to represent the weights by numbers, we should take, as the unit, the weight of a certain arbitrarily chosen system of observations.
8.
If the errors of the observations have a constant part, subtracting it from each obtained result reduces the mean error, increases the weight and precision. Retaining the notation of art. 5, and letting denote the mean error of the corrected observations, we have
If, instead of the constant part another number were subtracted from each observation, the square of the mean error would become
9.
Let be a determined coefficient and let the value of the integral
Then will be the probability that the error of a certain observation is less than in absolute value, and will be the probability that this error exceeds If, for has the value it will be equally likely for the error to be smaller or larger than thus, can be called the probable error. The relationship between and depends on the nature of the function which is unknown in most cases. However,i t is interesting to study this relationship in some particular cases.
I. If the extreme limits of the possible errors are and and if, between these limits, all errors are equally probable, the function will be constant between these same limits, and, consequently, equal to Hence, we have and so long as is less than or equal to finally and the probability that the error does not exceed the mean error is
II. If as before and are the limits of possible errors, and if we assume that the probability of these errors decreases from the error onwards like the terms of an arithmetic progression, then we will have
From this, we deduce that and as long as is between 0 and as long as is between 0 and 1; and finally,
In this case, the probability that the error remains below the mean error is
III. If we assume the function to be proportional to then it must be equal to
where denotes the semiperimeter of a circle of radius from which we deduce
(see Disquisitiones generales circa seriem infinitam, art. 28). If we let denote the value of the integral
then we have
The following table gives some values of this quantity:
10.
Although the relationship between and depends on the nature of the function some general results can be established that apply to all cases where this function does not increase with the absolute value of the variable then we have the following theorems:
- will not exceed whenever is less than
- will not exceed whenever exceeds
When the two limits coincide and cannot exceed
To prove this remarkable theorem, let be the value of the integral Then will be the probability that an error is between and Let us set
then we have and
and by hypothesis is always increasing between and or at least is not decreasing, or equivalently is always positive, or at least not negative. Now we have
thus,
Therefore, always has positive value, or at least this expression is never negative, and therefore
will always be positive and less than unity. Let be the value of this difference for since we have
This being prepared, let's consider the function
which we set and also Then it is clear that
Since is continually increasing with (or at least does not decrease, which should always be understood), and at the same time is constant, the difference
will be positive for all values of greater than and negative for all values of smaller than It follows that the difference is always positive, and consequently, will certainly be greater than in absolute value, as long as the function is positive, i.e. between and The value of the integral
will therefore be less than that of the integral
and a fortiori less than
i.e., less than Now the value of the first of these integrals is found to be
and therefore is less than with being a number between and If we consider as a variable, then this fraction, whose differential is
will be continually decreasing as increases from to so long as is less than and therefore its maximum value will be found when and will be so that in this case, the coefficient will certainly be less, or at least not greater than Q.E.P. On the other hand, when is greater than the maximum value of the function will be found when i.e. for and this maximum value will be so in this case, the coefficient will not be greater than Q.E.S.
Thus e.g. for it is certain that will not exceed which means that the probable error cannot exceed to which it was found to be equal in the first example in art. 9. Furthermore, it is easily concluded from our theorem that is not less than when is less than and on the other hand, it is not less than when is greater than
11.
Since several of the problems discussed below involve the integral it will be worthwhile for us to evaluate it in some special cases. Let us denote the value of the integral Template:C by I. When for values of between and we have
II. In the second case of art. 9, with still between and we have
III. In the third case, where Template:C we find, as explained in the commentary cited above, that
It can also be demonstrated, with only the assumptions of the previous article, that the ratio is never less than
12.
Let etc. denote the errors made in observations of the same kind, and suppose that these errors are independent of each other. Let be the relative probability of error and let be a rational function of variables Template:Nobr Then the multiple integral
| Template:Optional style|(I) |
extended to all values of the variables Template:Nobr for which the value of falls between the given limits and represents the probability that the value of is between and This integral is evidently a function of whose differential we set so that the integral in question is equal to and therefore, represents the relative probability of an arbitrary value of Since can be regarded as a function of the variables Template:Nobr, which we set Template:C the integral (I) will be Template:C where takes values between and and the other variables take all values for which is real. Hence we have Template:C the integration, where is to be regarded as a constant, being extended to all values of the variables Template:Nobr for which takes a real value.
13.
The previous integration would require knowledge of the function which is unknown in most cases. Even if this function were known, the calculation would often exceed the capabilities of analysis. Therefore, it will be impossible to obtain the probability of each value of but it is different if one asks only for the average value of which will be given by the integral extended to all possible values of And since it is evident that for all values which cannot attain, either due to the nature of the function (e.g. for negative values, if etc.), or because of the limits imposed on Template:Nobr, one can assume that it is clear that the integration can be extended to all real values of from to
But the integral taken between determinate limits and is equal to the integral Template:C, taken from to and extended to all values of the variables Template:Nobr for which is real. This integral is therefore equal to the integral Template:C in which is expressed as a function of Template:Nobr, and the integration is extended to all values of the variables that leave between and Thus, the integral}} Template:C can be obtained from the integral Template:C where the integration is extended to all real values of that is, from to to Template:Nobr
If the function reduces to a sum of terms of the form Template:C then the value of the integral Template:C extended to all values of or equivalently the average value of will be equal to a sum of terms of the form Template:C that is, the average value of is equal to a sum of terms derived from those that make up by replacing Template:Nobr with their average values. The proof of this important theorem could easily be derived from other considerations.
15.
Let us apply the theorem of the previous article to the case where Template:C and denotes the number of terms in the numerator.
We immediately find that the average value of is equal to the letter having the same meaning as above. The true value of may be lower or higher than its average, just as the true value of may, in each case, be lower or higher than but the probability that by chance, the value of differs by a small amount from will approach certainty as becomes larger. In order to clarify this, since it is not possible to determine this probability exactly, let us investigate the mean error to be feared when It is clear from the principles of art. 6 that this error will be the square root of the average value of the function Template:C To find it, it suffices to observe that the average value of a term such as is equal to ( having the same meaning as in art. 11), and that the average value of a term such as is equal to therefore, the average value of this function will be Template:C
Since this last formula contains the quantity if we only want to get an idea of the precision of this determination, it will suffice to adopt a certain hypothesis about the function E.g. if we take the third assumption of arts. 9 and 11, this error will be equal to Alternatively, we can obtain an approximate value of by means of the errors themselves, using the formula Template:C In general, it can be stated that a precision twice as great in this determination will require a quadruple number of errors, meaning that the weight of the determination is proportional to the number
Similarly, if the errors of the observations contain a constant part, we will deduce from their arithmetic mean a value of the constant part, and this value will be approached as the number of errors increases. In this determination, the mean error to be feared will be represented by where denotes the constant part, and denotes the mean error of the observations uncorrected for their constant error. It will be simply represented by if represents the mean error of the observations corrected for the constant part (see art 8).
16.
In the arts. 12-15, we assumed that the errors Template:Nobr belonged to the same type of observation, so that the probability of each of these errors was represented by the same function. However, it is clear that the general principles outlined in arts. 12-14 can be applied with equal ease in the more general case where the probabilities of the errors Template:Nobr, are represented by different functions etc., i.e. when these errors belong to observations of varying precision or uncertainty. Let denote the error of an observation with a mean error to be feared of and let Template:Nobr denote the errors of other observations with mean errors to be feared of Template:Nobr Then the average value of the sum etc. will be etc. Now, if it is also known that the quantities Template:Nobr are respectively proportional to the numbers Template:Nobr, then the average value of the expression Template:C will be However, if we adopt for the value that this expression will take, by substituting the errors Template:Nobr, as chance offers them, then the mean error affecting this determination will become, just as in the preceding article, Template:C where Template:Nobr, have the same meaning with respect to the second and third observation, as does with respect to the first; and if we can assume the numbers Template:Nobr, proportional to Template:Nobr, this mean error to be feared will be equal to Template:C
But this method of determining an approximate value for is not the most advantageous. Consider the more general expression Template:C whose average value will also be regardless of the coefficients Template:Nobr The mean error to be feared when substituting the value for a value of as determined by the likelihoods of Template:Nobr, will, according to the principles above, be given by the formula Template:C To minimize this error, we must set Template:C These values cannot be evaluated until the exact ratios Template:Nobr are known. In the absence of exact knowledge[1], it is safest to assume them equal to each other (see art. 11), in which case Template:C i.e. the coefficients Template:Nobr, should be assumed equal to the relative weights of the various observations, taking the weight of the one corresponding to the error as the unit. With this assumption, let denote, as above, the number of proposed errors. Then the average value of the expression Template:C will be and when we take, for the true value of the randomly determined value of this expression, the mean error to be feared will be Template:C and, finally, if we are allowed to assume that the quantities Template:Nobr, are proportional to Template:Nobr, this expression reduces to Template:C which is identical to what we found in the case where all observations were of the same type.
17.
When the value of a quantity, which depends on an unknown magnitude, is determined by an observation whose precision is not absolute, the result of this observation may provide an erroneous value for the unknown, but there is no room for discretion in this determination. But if several functions of the same unknown have been found by imperfect observations, we can obtain the value of the unknown either by any one of these observations, or by a combination of several observations, which can be carried out in infinitely many ways. The result will be subject, in all cases, to a possible error, and depending on the combination chosen, the mean error to be feared may be greater or smaller. The same applies if several observed quantities depend on multiple unknowns. Depending on whether the number of observations equals the number of unknowns, or is smaller or larger than this number, the problem will be determined, undetermined, or more than determined (at least in general), and in this third case, the observations can be combined in infinitely many ways to provide values for the unknowns. Among these combinations, the most advantageous ones must be chosen, i.e., those that provide values for which the mean error to be feared is as small as possible. This problem is certainly the most important one presented by the application of mathematics to natural philosophy.
In Theoria motus corporum coelestium we have shown how to find the most probable values of unknowns when the probability law of the observational errors is known, and since, in almost all cases, this law remains hypothetical by its nature, we have applied this theory to the highly plausible hypothesis that the probability of error is proportional to Hence this method that I have followed, especially in astronomical calculations, and which most calculators now use under the name of Method of Least Squares.
Template:Sc later considered the question from another point of view, and showed that this principle is preferable to all others, regardless of the probability law of the errors, provided that the number of observations is very large. But when this number is limited, the question remains open; so that, if we reject our hypothetical law, the method of least squares would be preferable to others, for the sole reason that it leads to simpler calculations.
We therefore hope to please geometers by demonstrating in this Memoir that the method of least squares provides the most advantageous combination of observations, not only approximately, but also absolutely, regardless of the probability law of errors and regardless of the number of observations, provided that we adopt for the mean error, not Laplace's definition, but the one which we have given in arts. 5 and 6.
It is necessary to warn here that in the following investigations, only random errors reduced by their constant part will be considered. It is up to the observer to carefully eliminate the causes of constant errors. We reserve for another occasion the examination of the case where observations are affected by an unknown constant error, and we will address this issue in another Memoir.
18.
Template:Sc Let be a given function of the unknowns Template:Nobr; we ask for the mean error to be feared in determining the value of when, instead of the true values of Template:Nobr, we take the values derived from independent observations; Template:Nobr, being the mean errors corresponding to these various observations.
Solution. Let Template:Nobr denote the errors of the observed values Template:Nobr; the resulting error for the value of the function can be expressed by the linear function Template:C where Template:Nobr, represent the derivatives Template:Nobr, when Template:Nobr, are replaced by their true values.
This value of is evident if we assume the observations to be accurate enough so that the squares and products of the errors are negligible. It follows that the average value of is zero, since we assume that the errors of the observations have no constant part. Now the mean error to be feared in the value of will be the square root of the average value of or equivalently will be the average value of the sum Template:C but the average value of is that of is Template:Nobr, etc., and finally the average values of the products are all zero. Hence we find that Template:C
It is good to add several remarks to this solution.
I. Since we neglect powers of errors higher than the first, we can, in our formula, take for Template:Nobr, the values of the differential coefficients Template:Nobr, derived from the observed values Template:Nobr Whenever is a linear function, this substitution is rigorously exact.
II. If instead of mean errors, one prefers to introduce weights Template:Nobr for the respective observations, with the unit being arbitrary, and being the weight of the value of Then we will have Template:C
III. Let be another function of Template:Nobr and let Template:C The error in the determination of from the observed values Template:Nobr will be Template:C and the mean error to be feared in this determination will be Template:C It is obvious that the errors and will not be independent of each other, and the mean value of the product will not be like the mean value of but instead it will be equal to Template:C
IV. The problem includes the case where the values of the quantities Template:Nobr, are not immediately given by observation, but are deduced from any combinations of direct observations. For this extension to be legitimate, the determinations of these quantities must be independent, i.e., they must be provided by different observations. If this condition of independence is not fulfilled, the formula giving the value of would no longer be accurate. For example, if the same observation were used both in determining and in determining the errors and would no longer be independent, and the mean value of the product would no longer be zero. If, in this case, the relationship between and and the results of the simple observations from which they derive is known, we can calculate the mean value of the product as indicated in remark III, and consequently correct the formula which gives
19.
Let Template:Nobr, be functions of the unknowns Template:Nobr Let be the number of these functions, and let be the number of unknowns. Suppose that observations have given, immediately or indirectly, Template:Nobr, and that these determinations are absolutely independent of each other. If is greater than then the determination of the unknowns is an indeterminate problem. If is equal to then each of the unknowns Template:Nobr, can be reduced to a function of Template:Nobr so that the values of the former can be deduced from the observed values of the latter, and the previous article will allow us to calculate the relative accuracy of these various determinations. If is less than then each unknown Template:Nobr, can be expressed in infinitely many ways as a function of Template:Nobr, and, in general, these values will be different; they should coincide if the observations were, contrary to our assumptions, rigorously accurate. It is clear, moreover, that the various combinations will provide results whose accuracy will generally be different.
Moreover, if, in the second and third cases, the quantities Template:Nobr, are such that of them, or more, can be regarded as functions of the others, the problem is more than determined relative to these latter functions and indeterminate relative to the unknowns Template:Nobr; and we could not even determine these latter unknowns, even if the functions Template:Nobr, were exactly known: but we exclude this case from our investigations.
If Template:Nobr, are not linear functions of the unknowns, we can always assign them this form, by replacing the primitive unknowns with their difference from their approximate values, which we assume known; the mean errors to be feared in the determinations Template:C being respectively denoted by Template:Nobr, and the weights of these determinations by Template:Nobr, so that Template:C We will assume that both the ratios of the mean errors and the weights are known, one of which will be arbitrarily chosen. Finally, if we set Template:C then things will proceed as if immediate observations, equally precise and with mean error had given Template:C
20.
Template:Sc Let Template:Nobr, be the following linear functions of the unknowns Template:Nobr,
| Template:Optional style|(1) |
Among all systems of coefficients Template:Nobr, that identically satisfy Template:C being independent of Template:Nobr, find the one for which obtains its minimum value.
Solution. — Let us set
| Template:Optional style|(2) |
are linear functions of and we have
| Template:Optional style|(3) |
where denotes the sum and similarly for the other sums.
The number of quantities Template:Nobr, is equal to the number of unknowns Template:Nobr namely . Thus, by elimination, one can obtain an equation of the following form,[2]
which will be identically satisfied if we replace with their values from (3). Consequently, if we set
| Template:Optional style|(4) |
then we will have identically
| Template:Optional style|(5) |
This equation shows that among the different systems of coefficients Template:Nobr, we must consider the system
Moreover, for any system, we will have identically
and this equation, being identical, leads to the following:
Adding these equations after multiplying them, respectively, by Template:Nobr, we will have, by virtue of the system (4), Template:C which is the same as Template:C thus, the sum Template:C will have its minimum value when Template:Nobr Q.E.I.
Moreover, this minimum value will be obtained as follows. Equation (5) shows that we have Template:C Let's multiply these equations, respectively, by Template:Nobr, and add them; considering the relations (4), we find Template:C
21.
When the observations have provided approximate equations Template:Nobr it will be necessary, to determine the unknown to choose a combination of the form Template:C such that the unknown acquires a coefficient equal to , and that the other unknowns are eliminated.
According to art. 18, the weight of this determination will be given by Template:C According to the previous article, the most suitable determination will be obtained by taking Template:Nobr Then will have the value and it is clear the same value would be obtained (without knowing the multipliers Template:Nobr), by performing elimination on the equations Template:Nobr The weight of this determination will be given and the mean error to be feared will be Template:C
A similar approach would lead to the most suitable values of the other unknowns Template:Nobr, which would be those obtained by performing eliminating on the equations etc.
If we denote the sum or equivalently Template:C by , then it is clear that etc. will be the partial differential quotients of the function i.e.
Therefore, the values of the unknowns that are deduced from the most suitable combination, and which we can call the most plausible values, are precisely those that minimize . Now represents the difference between the observed value and the computed value. Thus, the most plausible values of the unknowns are those that minimize the sum of the squares of the differences between the calculated and observed values of the quantities Template:Nobr, these squares being respectively multiplied by the weight of the observations. I had established this principle a long time ago through other considerations, in Theoria Motus Corporum Coelestium.
If one wants to assign the relative precision of each determination, it is necessary to deduce the values of Template:Nobr from the equations (3), which gives them in the following form:
| Template:Optional style|(7) |
Accordingly, the most plausible values of the unknowns Template:Nobr, will be Template:Nobr The weights of these determinations will be Template:Nobr and the mean errors to be feared will be
| for |
| for |
| for |
in agreement with the results obtained in Theoria Motus Corporum Coelestium.
22.
The case where there is only one unknown is the most frequent and simplest of all. In this case we have Template:Nobr We will then have Template:Nobr Template:Nobr and consequently, Template:C Hence Template:C
Therefore, if by several observations that do not have the same precision and whose respective weights are Template:Nobr, we have found, for the same quantity, a first value a second a third Template:Nobr, then the most plausible value will be Template:C and the weight of this determination will be If all observations are equally plausible, then the most probable value will be Template:C i.e. the arithmetic mean of the observed values; taking the weight of an individual observation as the unit, the weight of the average will be
Part Two
23.
A number of investigations still remain to be discussed, through which the preceding theory will be clarified and extended.
Let us first investigate whether the elimination used to express the variables Template:Nobr, in terms of Template:Nobr, is always possible. Since the number of equations is equal to the number of unknowns, we know that this elimination will be possible if Template:Nobr are independent of each other; otherwise, it is impossible.
Suppose, for a moment, that Template:Nobr are not independent, but rather there exists between these quantities an identical equation Template:C We will then have Template:C
Let us set
| Template:Optional style|(1) |
from which it follows that Template:C Multiplying the equations (1) resp. by Template:Nobr and adding, we obtain Template:C and this equation leads to etc. From this we conclude, first of all, Secondly, the equations (1) show that the functions Template:Nobr, are such that their values do not change when the variables Template:Nobr, increase or decrease proportionally to Template:Nobr respectively. It is clear that the same holds for the functions Template:Nobr: but this can only happen in the case where it would be impossible to determine Template:Nobr etc. using the values of Template:Nobr even if these were exactly known; but then the problem would be indeterminate by its nature, and we will exclude this case from our investigations.
24.
If Template:Nobr denote multipliers playing the same role relative to the unknown as the multipliers Template:Nobr relative to the unknown i.e. so that we have Template:C then we will identically have Template:C Let Template:Nobr be the analogous multipliers relative to the variable so that we have: Template:C and consequently, Template:C In the same way as we found in art. 20 that Template:C we will find here Template:C and so on.
We will also have, as in art. 20
If we multiply the values Template:Nobr (art. 20. (4)), respectively, by Template:Nobr, and add; we obtain
If we multiply Template:Nobr, respectively, by Template:Nobr, and add, we will find
In the same manner, we find
25.
Let Template:Nobr denote the values taken by the functions Template:Nobr, when Template:Nobr are replaced by their most plausible values, Template:Nobr, i.e. Template:C
If we set Template:C so that is the value of the function corresponding to the most plausible values of the variables, and therefore, as was shown in art. 20, the minimum value of Then the value of will be corresponding to Template:Nobr\end{aligned},</math> and this value is zero, according to the way Template:Nobr have been obtained. Thus, we have Template:C and similarly we would obtain Template:C and Template:C Finally, multiplying the values of Template:Nobr respectively by and adding, we get or Template:C
26.
Replacing Template:Nobr, with the expressions (7) from art. 21 in the equation we find, through the same reductions as before, Template:C Multiplying either these equations or the equations (1) of art. 20, by Template:Nobr, and then adding, we obtain the identity Template:C
27.
The function can take several forms, which are worth developing.
Let us square the equations (1) art. 20, and add them. Then we find Template:C this is the first form.
Next let us multiply the same equations by Template:Nobr respectively, and add. Then we obtain and replacing Template:Nobr, with the values indicated in the previous article, we find that or this is the second form.
Finally, replacing, in this second form, Template:Nobr by the expressions (7) art. 21, we obtain the 'third form': Template:C
We can also give a fourth form which results automatically from the third form and the formulas of the previous article: Template:C From this last form we clearly see that is the minimum value of
28.
Let Template:Nobr, be the errors made in the observations that gave Template:Nobr Then the true values of the functions Template:Nobr, will be Template:Nobr respectively, and the true values of Template:Nobr, will be etc. respectively. therefore, the true value of will be Template:C and the error made in the most suitable determination of the unknown which we will denote by will be Template:C Similarly, the error made in the most suitable determination of the value of will be Template:C The average value of the square will be Template:C The average value of will similarly be as shown above. We can also determine the average value of the product which will be Template:C These results can be stated more briefly as follows:
The average values of the squares Template:Nobr, are respectively equal to the products of with the second-order partial differential quotients Template:C and the average value of a product such as is the product of with where is regarded as a function of Template:Nobr}
29.
Let be a given linear function of the quantities Template:Nobr, i.e. Template:C the value of deduced from the most plausible values of Template:Nobr, will then be and we denote this by The error thus committed will be Template:C which we denote by The average value of this error will obviously be zero, meaning the error will not contain a constant part, but the average value of i.e., the sum Template:C will, according to the preceding article, be equal to the product of with the sum Template:C i.e., the product of with the value produced by the function when we substitute
If we let denote this value of then the mean error to be feared when we take will be and the weight of this determination will be .
Since we have identically Template:C will be equal to the value of the expression or the value produced by when we substitute for Template:Nobr the values corresponding to Template:Nobr.
Finally, observing that expressed as a function of the quantities Template:Nobr, will have as its constant part, if we suppose that Template:C then we will have Template:C
30.
We have seen that the function attains its absolute minimum when we substitute Template:Nobr or, equivalently, Template:Nobr If we assign another value to one of the unknowns, e.g. while the other unknowns remain variable, may acquire a relative minimum value, which can be obtained from the equations Template:C Therefore, we must have Template:Nobr and since Template:C we have Template:C Likewise, we have Template:C and the relative minimum value of will be Template:C Reciprocally, we conclude that if is not to exceed then the value of must necessarily be between the limits and It is important to note that becomes equal to the mean error to be feared in the most plausible value of if we set i.e., if is the mean error of observations whose weights are .
More generally, let us find the smallest value of the function that can correspond to a given value of where denotes, as in the previous article, a linear expression whose most plausible value is . Let us denote by the prescribed value of by According to the theory of maxima and minima, the solution to the problem will be given by the equations Template:C or etc., where denotes an as yet undetermined multiplier. If, as in the previous article, we identically set, Template:C then we will have Template:C or Template:C where has the same meaning as in the previous article.
Since is a homogeneous function of the second degree with respect to the variables etc., its value when etc. will evidently be and thus the minimum value of when will be Reciprocally, if must remain less than a given value the value of will necessarily be between the limits and will be the mean error to be feared in the most plausible value of if represents the mean error of observations whose weights are .
31.
When the number of unknowns Template:Nobr is quite large, the determination of the numerical values of etc. by ordinary elimination is quite tedious. For this reason we have indicated, in Theoria Motus Corporum Coelestium art. 182, and later developed, in Disquisitione de elementis ellipticis Palladis (Comm. recent. Soc. Gotting Vol. I), a method that simplifies this work as much as possible. Namely, the function must be reduced to the following form: Template:C where the divisors etc., are determined quantities; etc., are linear functions of etc., such that the second does not contain the third contains neither nor the fourth contains neither nor nor and so on, so that the last contains only the last of the unknowns etc.; and finally, the coefficients of etc., in etc., are respectively equal to etc. Then we set etc. and we will easily obtain the values of Template:Nobr by solving these equations, starting with the last one. I do not believe it necessary to repeat the algorithm that leads to the transformation of the function .
However, the elimination required to find the weights of these determinations requires even longer calculations. We have shown in the Theoria Motus Corporum Coelestium that the weight of the last unknown, (which appears by itself in is equal to the last term in the series of divisors etc. This is easily found; hence, several calculators, wanting to avoid cumbersome elimination, have had the idea, in the absence of another method, to repeat the indicated transformation by successively considering each unknown as the last one. Therefore, I hope that geometers will appreciate my indication of a new method for calculating the weights of determinations, which seems to leave nothing more to be desired on this point.
Setting
| Template:Optional style|(1) |
we have identically Template:C and from this we deduce:
| Template:Optional style|(2) |
The values of Template:Nobr deduced from these equations will be presented in the following form:
| Template:Optional style|(3) |
By taking the complete differential of the equation Template:C we obtain Template:C and thus Template:C This expression must be equivaleny to the one obtained from the equations (3), Template:C and therefore we have
| Template:Optional style|(4) |
By substituting in these expressions the values of and Template:Nobr obtained from the equations (3), we will have performed the elimination. For the determination of the weights, we have
| Template:Optional style|(5) |
The simplicity of these formulas leaves nothing to be desired. Equally simple formulas could be found to express the other coefficients and Template:Nobr; however, as their use is less frequent, we will refrain from presenting them.
33.
The importance of the subject has prompted us to prepare everything for the calculation and to form explicit expressions for the coefficients Template:Nobr Template:Nobr This calculation can be approached in two ways. The first involves substituting the values of and so forth, deduced from the equations (3) into the equations (2), and the second involves substituting the values from the equations (2) into the equations (3). The first method leads to the following formulas: Template:C These formulas will determine and so on.
We will then have, Template:C which will determine and so forth; then Template:C which will determine etc., and so on.
The second method yields the following system: Template:C from which we deduce Template:C from which we deduce and Template:C from which we deduce and so on.
Both systems of formulas offer nearly equal advantages when seeking the weights of the determinations of all unknowns and so forth; however, if only one of the quantities and so forth is required, the first system is much preferable.
Moreover, the combination of equations (1) and (4) yields the same formulas, and provides, in addition, a second way to obtain the most plausible values and so forth, which are Template:C The other calculation is identical to the ordinary calculation in which it is assumed Template:Nobr
34.
The results obtained in art. 32 are only particular cases of a more general theorem which can be stated as follows:
Template:Sc If represents the following linear function of the unknowns Template:Nobr, Template:C whose expression in terms of the variables Template:Nobr, is Template:C then will be the most plausible value of and the weight of this determination will be Template:C
Proof. The first part of the theorem is obvious, since the most plausible value of must correspond to the values Template:Nobr
To demonstrate the second part, let's note that we have Template:C and consequently, when Template:C we have Template:C whatever the differentials Template:Nobr Hence, assuming always, we obtain Template:C Now it is easily seen that if the differentials Template:Nobr are independent of each other, so will be Template:Nobr, therefore, we will have, Template:C Hence, the value of corresponding to the same assumptions, will be Template:C which, by art. 29, demonstrates the truth of our theorem.
Moreover, if we wish to perform the transformation of the function without resorting to formulas (4) of art. 32, we immediately have the relations Template:C which will allow us to determine Template:Nobr, and we will finally have Template:C
35.
We will particularly address the following problem, both because of its practical utility and the simplicity of the solution:
Find the changes that the most plausible values of the unknowns undergo by adding a new equation, and assign the weights of these new determinations.
Let us keep the previous notations. The primitive equations, reduced to have a weight of unity, will be we will have and Template:Nobr, will be the partial derivatives Template:C Finally, by elimination, we will have
| Template:Optional style|(1) |
Now suppose we have a new approximate equation (which we assume to have a weight equal to unity), and we seek the changes undergone by the most plausible values of Template:Nobr, and of the coefficients Template:Nobr.
Let us set Template:C and let Template:C be the result of the elimination. Finally, let Template:C which, taking into account the equations (1), becomes Template:C and let Template:C
It is clear that will be the most plausible value of the function as resulting from the primitive equations, without considering the value provided by the new observation, and will be the weight of this determination.
Now we have Template:C and consequently, Template:C or Template:C Furthermore, Template:C From this, we deduce, Template:C which will be the most plausible value of deduced from all observations.
We will also have Template:C thus Template:C will be the weight of this determination.
Similarly, for the most plausible value of deduced from all observations, we find Template:C the weight of this determination will be Template:C and so on. Q.E.I.
Let us add some remarks.
I. After substituting the new values Template:Nobr, the function will obtain the most plausible value Template:C and since we have, identically, Template:C the weight of this determination, according to art. 29, will be Template:C These results could be deduced immediately from the rules explained at the end of art. 21. The original equations had, indeed, provided the determination whose weight was A new observation gives another determination independent of the first, whose weight is and their combination produces the determination with a weight of
II. It follows from the above that, for Template:Nobr we must have Template:Nobr and consequently, Template:C Furthermore, since Template:C we must have Template:C and Template:C
III. Comparing these results with those of the art. 30, we see that here the function has the smallest value it can obtain when subjected to the condition
36.
We will give here the solution to the following problem, which is analogous to the previous one, but we will refrain from indicating the demonstration, which can be easily found, as in the previous article.
Find the changes in the most plausible values of the unknowns and the weights of the new determinations when changing the weight of one of the primitive observations.
Suppose that after completing the calculation, it is noticed that the weight which has been assigned to an observation is too strong or too weak, e.g. the first one which gave and that it would be more accurate to assign it the weight instead of the weight It is not necessary to then restart the calculation. Instead it is convenient to form the corrections using the following formulas.
The most plausible values of the unknowns will be corrected as follows: Template:C and the weights of these determinations will be found upon dividing unity by Template:C respectively.
This solution applies in the case where, after completing the calculation, it is necessary to completely reject one of the observations, since this amounts to making ; similarly, will be suitable for the case where the equation which in the calculation had been regarded as approximate, is in fact absolutely precise.
If, after completing the calculation, several new equations were to be added to those proposed, or if the weights assigned to several of them were incorrect, the calculation of the corrections becomes too complicated, and it is preferable to start over.
37.
In the arts. 15 and 16, we have given a method to approximate the accuracy of a system of observations; but this method assumes that the real errors encountered in a large number of observations are known exactly; however, this condition is rarely fulfilled, if ever.
If the quantities for which the observation provides approximate values depend on one or more unknowns, according to a given law, then the method of least squares allows us to find the most plausible values of these unknowns. If we then calculate the corresponding values of the observed quantities, they can be regarded as differing little from the true values, so that their differences with the observed values will represent the errors committed, with a certainty that will increase with the number observations. This is the procedure followed in practice by calculators, who have attempted, in complicated cases, to retrospectively evaluate the precision of the observations. Although sufficient in many cases, this method is theoretically inaccurate and can sometimes lead to serious errors; therefore, it is very important to treat the issue with more care.
In the following discussion, we retain the notation used in art. 19. The method in question consists of considering Template:Nobr, as the true values of the unknowns Template:Nobr, and Template:Nobr, as those of the functions Template:Nobr If all observations have equal precision and their common weight is taken to be unity, these same quantities, changed in sign, represent, under this assumption, the errors of the observations. Consequently, according to art. 15, Template:C will be the mean error of the observations. If the observations do not have the same precision, then Template:Nobr, represent the errors of the observations, respectively multiplied by the square roots of the weights, and the rules of art. 16 lead to the same formula, Template:C which already expresses the mean error of these observations, when their weight is . However, it is clear that an exact calculation would require replacing Template:Nobr with the values of Template:Nobr, deduced from the true values of the unknowns Template:Nobr, and replacing the quantity by the corresponding value of Although we cannot assign this latter value, we are nonetheless certain that it is greater than (which is its minimum possible value), and it would only reach this limit in the infinitely unlikely case where the true values of the unknowns coincide with the most plausible ones. We can therefore affirm, in general, that the mean error calculated by ordinary practice is smaller than the exact mean error, and consequently, that too much precision is attributed to the observations. Now let us see what a rigorous theory yields.
38.
First of all, we need to determine how the quantity depends on the true errors of the observations. As in art. 28, Let us denote these errors by Template:Nobr, and let us set, for simplicity, Template:C and Template:C Let Template:Nobr, be the true values of the unknowns Template:Nobr, for which Template:Nobr, are, respectively, Template:Nobr The corresponding values of Template:Nobr, will obviously be so that we will have Template:C Finally, Template:C will be the value of the function corresponding to the true values of the Template:Nobr Since we also have identically Template:C we will also have Template:C From this, it is clear that is a homogeneous function of the second degree of the errors Template:Nobr; for various values of the errors this function may become greater or smaller. However, the extent of the errors remains unknown to us, so it is good to carefully examine the function , and to first calculate its average value according to the elementary calculus of probability. We will obtain this average value by replacing the squares Template:Nobr with Template:Nobr, and omitting the terms in Template:Nobr, whose average value is zero; or equivalently, by replacing each square Template:Nobr, by and neglecting Template:Nobr. Accordingly, the term will provide ; the term will produce Template:C each of the other terms will also give so that the total average value will be where denotes the number of observations, and denotes the number of unknowns. Due to errors offered by chance, the true value of may be greater or smaller than this average value, but the difference decrease as the number of observations increases, so that Template:C can be regarded as an approximate value of Consequently, the value of provided by the erroneous method we discussed in the previous article, must be increased by the ratio of to
39.
To clearly understand the extent to which it is permissible to consider the value of provided by the observations as equal to the exact value, we must seek the mean error to be feared when This mean error is the square root of the average value of the quantity Template:C which we will write as: Template:C and since the average value of the second term is evidently zero, the question reduces to finding the average value of the function Template:C If we denote this average value by then the mean error we seek will be
Expanding the function we see that it is a homogeneous function of the errors Template:Nobr, or equivalently, of the quantities Template:Nobr; therefore, we will find the average value by:
1. Replacing the fourth powers Template:Nobr, by their average values;
2. Replacing the products Template:Nobr, by their average values, that is, by Template:Nobr;
3. Neglecting products such as Template:Nobr. We will assume (see art. 16) that the average values of Template:Nobr, are proportional to Template:Nobr, so that the ratios of one to another are where denotes the average value of the fourth powers of the errors for observations whose weight is . Thus the previous rules could also be expressed as follows: Replace each fourth power Template:Nobr, by each product Template:Nobr, by and neglect all terms such as or
These principles being understood, it is easy to see that:
I. The average value of is Template:C
II. The average value of the product is Template:C because Template:C Similarly, the average value of is Template:C the average value of is Template:C and so on. Thus the average value of the product Template:C will be Template:C
The products or Template:Nobr, will have the same average value. Thus the product Template:C will have an average value of Template:C
III. To shorten the following developments, we will adopt the following notation. We give the character a more extended meaning than we have done so far, by making it designate the sum of similar but not identical terms arising from all permutations of the observations. According to this notation, we will have Template:C Calculating the average value of term by ter, we first have, for the average value of the product Template:C Similarly, the average value of the product is Template:C and so on. Therefore, the average value of the product Template:C is Template:C Now the average value of is Template:C The average value of is Template:C and so on. Hence, we easily conclude that the average value of the product Template:C is Template:C Thus, for the average value of the product we have Template:C
IV. Similarly, for the average value of the product we find Template:C Now, we have Template:C so this average value will be Template:C
V. By a similar calculation, we find that the average value of is Template:C and so on. Adding up, we obtain the average value of the product Template:C this value is Template:C
VI. Similarly, we find that Template:C is the average value of the product Template:C and Template:C is the average value of the product Template:C and so on.
Hence by addition we find the average value of the square Template:C which is Template:C
VII. Finally, from all these preliminaries, we conclude that Template:C Therefore, the mean error to be feared when Template:C will be Template:C
40.
The quantity Template:C which occurs in the expression above, generally cannot be reduced to a simpler form. However, we can assign two limits between which its value must necessarily lie. First, It is easily deduced from the previous relations that Template:C from which we conclude that Template:C is a positive quantity smaller than unity, or at least not larger. The same will be true for the quantity Template:C which is equal to the sum Template:C Similarly, Template:C will be smaller than unity; and so on. Therefore, Template:C must be smaller than Second, we have Template:C since Template:C from which it is easily deduced that Template:C is greater, or at least not smaller, than Therefore, the term Template:C must necessarily lie between the limits Template:C or, between the broader limits Template:C Thus, the square of the mean error to be feared for the value Template:C lies between the limits Template:C so that a degree of precision as great as desired can be achieved, provided the number of observations is sufficiently large.
It is very remarkable that in hypothesis III of art. 9, on which we had formerly relied to establish the theory of least squares, the second term of the square of the average error completely disappears (since ); and because, to find the approximate value of the average error of the observations, it is always necessary to treat the sum Template:C as if it were equal to the sum of the squares of random errors, it follows that, in this hypothesis, the precision of this determination becomes equal to that which we found, in art. 15, for the determination from true errors.
- ↑ The exact determination of Template:Nobr, is conceivable only in the case where, by the nature of the matter, the errors Template:Nobr proportional to Template:Nobr, are considered equally probable, or rather in the case where Template:C
- ↑ We will later explain the reasoning that led us to denote the coefficients of this formula by the notation Template:Nobr.