# Simon Newcomb and the Legacy of Meta Analysis in Determining Fundamental Constants

The diversity in the adopted values of the elements and constants of astronomy is productive of inconvenience to all who are engaged in investigations based upon these quantities, and injurious to the precision and symmetry of much of our astronomical work 1

Few works have survived the scrutiny of decades of scientific progress as well as Simon Newcomb’s Elements of the Four Inner Planets. Compiled over roughly twenty years of labor, his tables of the positions of the inner celestial bodies stood as the gold standard in planetary measurement for over eighty years after their publication in 1895.

In creating his tables, Newcomb’s stated goal was to synchronize the field of positional astronomy according to the clock of constants of nature, which at the time had yet to be pinned down to definitive values. Part of Newcomb’s motivation for producing the Elements of the Four Inner Planets came from his own frustration upon preparing the 1877 American Ephemeris using discordant measurements of constants like planetary masses and diameters. Another implicit motivation of Newcomb’s endeavor was to marshal enough confidence in the values of the fundamental constants in order to provide evidence for physical phenomena that could not yet be explained by Newtonian gravitational theory.

Specifically, Newcomb achieved a level of precision in his tables that precluded any possibility that the 43 arcsecond discrepancy in the perihelion of Mercury could be due to measurement error alone. His painstaking process of identifying sources of error and his weighted integration of multiple sources of observational data – which may be justified retrospectively from the perspective of modern statistical theory – left little doubt that some modification of gravitational theory was necessary to explain what astronomers had observed over more than a century.

The durability of Newcomb’s calculations was due in large part to the fact that Newcomb was so meticulous in the process of collating observational data. He was endlessly transparent about any aspects in which the data were modified to suit the author’s judgment, and though much of Newcomb’s data aggregation had a subjective component, he carefully pointed out where he made certain decisions and why he did so.

Years of observation
Greenwich 1750-1892
Palermo 1791-1813
Paris 1801-1889
Konigsberg 1814-1845
Dorpat 1823-1838
Cambridge 1828-1844
Berlin 1838-1842
Pulkowa 1842-1875
Washington 1846-1891
Leiden 1863-1871
Strassburg 1884-1887
Cape of Good Hope 1884-1890

Table 1: Newcomb’s table of observatories

In particular, Newcomb took care to ascribe as much deviance to measurement error as he could where plausible. It should be noted that due to the amount of data he used in his calculations, this was no easy task. Newcomb gathered over 62,000 meridian observations from thirteen different observatories to produce his tables, compared with roughly 9,000 observations used by the existing gold standard tables produced by Le Verrier. For each source of data, and for each type of calculation, Newcomb detailed his reasons for how the observations were incorporated into his calculations. For instance, to produce the values in his tables of the Sun’s Right Ascension, he omitted observations from the observatory at Palermo:

It will be seen that Piazzi’s results are thrown out entirely. The wide range of his values of $$c$$ led to the inquiry whether more consistent results would be obtained by taking shorter periods, but it was found that the values of $$c$$ varied from time to time in such an irregular way that his instrument must have been affected by some extraordinary cause of error, unless some mistake has been made in interpreting or treating the observations 1

Speaking of the observations made at the Paris observatory, he wrote:

The value of $$c$$ for Paris, 1866-‘70, has received a much reduced weight, solely on account of its excessive value. It seems that the work of one observer who made many observations during this period was affected by an unusual systematic error 1

Newcomb was careful to fully state and justify each omission and adjustment in his process. Laying bare all steps taken in his research served the purpose of putting the results of that research in the proper light. When readers were able to follow his every move from beginning to end, they were able to ascribe a level of confidence to the final product commensurate with the quality of the moves. For Newcomb, transparency served the specific purpose of inspiring enough confidence to make deviations of the numbers in his tables from Newtonian theory credible evidence of unaccounted physical variation.

To achieve the high level of precision in his measurements, Newcomb adopted practices that resemble those dictated by modern statistical theory. However, since Newcomb was performing his calculations nearly half a century before the development of modern statistics by the likes of Fisher, Neyman and Pearson, he was almost certainly guided by helpful intuitions rather than rigorous details of statistical inference.

Newcomb’s method was characterized by two such pivotal intuitions: first was the idea that collecting more data would lead to more precise estimates of the fundamental constants. More formally, this is the notion that gathering more observations of an unknown quantity that can only be measured up to some symmetric, random error should reduce the variance of the estimator of that quantity.

Practitioners of positional astronomy before Newcomb simply removed erroneous measurements or chose only the “best” measurements to use in final calculations. 2 Of course, Newcomb also removed some data, but he did so only when he was sufficiently convinced that the observed values did not correspond to the underlying quantity at all. Indeed, this practice was acceptable given that the relationship between sample size and variance depends critically on how one believes observations relate to the quantity of interest.

According to the theory of statistical inference, if one believes that observed values were not derived from the desired quantity modulo measurement error, then they should be removed from consideration. However, more data leads to smaller variance in estimating the quantity of interest irrespective of the quality of the observations, so long as the data contain some small amount of signal from the quantity of interest and the quality of the observation itself may be estimated with some precision. It is this last concept that Newcomb implicitly sanctioned by going to great lengths to assemble tens of thousands of meridian observations and to carefully weight these observations when he combined them as follows:

The result of each separate observation of each body was compared with the tabular result thus derived. The residuals were then taken and divided into groups. The interval between the extreme dates of each group was always taken so short that it could be presumed that the mean of all the residuals would be the correction for the mean of all the dates. The general rule was that the interval should not exceed four or five days in the case of Mercury, or six or eight days in that of Venus, and that not more than six or eight observations should be included in a single group. In taking these means, weights were assigned to the results of each observatory founded on the discordance of its residuals. Then to each mean a weight was again assigned equal to the sum of the weights of the individual residuals when these were few in number, but not allowed to exceed a certain limit, how great soever might be the sum of the individual weights 1

Newcomb’s estimates for the positions of planets may roughly be translated as inverse-variance weighted averages, which are guaranteed to have smaller variance than a simple sample average in the presence of heterogeneous measurement quality. Without having the statistical theory to back his methods at the time, we may recognize today that he was operating under the proper statistical framework to reduce the variance of his estimates.

Newcomb’s second key insight was the idea that including data from more observatories should reduce the bias of estimators of fundamental constants. The important point is that he believed that not merely more data, but more diverse data ought to be used. As Newcomb observed in the case of Palermo, it was conceivable that observations from a particular observatory were subject to a measurement bias specific to that observatory. Under such an assumption, no matter how many observations were taken from that observatory, the estimate of true planetary position from that observatory would not become any less biased. In modern terms, Newcomb was implicitly positing a random effects model commonly employed in statistical meta analyses.

The goal of a meta analysis is the same as that of a generic statistical inference problem, but the means of achieving the goal are specialized. Newcomb’s meta analysis sought the value of planetary position at a given time $$\mu_t$$, but instead of observing data directly from the position modulo error, he observed 13 estimators of the position

$\hat{\theta_t}_1, \dots, \hat{\theta_t}_{13}$

one from each observatory. He also obtained the (estimated) variance of those estimators

$\hat{\sigma_t}^2_1, \dots, \hat{\sigma_t}^2_{13}$

by his calculation of residuals with respect to Le Verrier’s tables. Crucially, these estimators arose from the 13 observatories, each of which had underlying quantities

${\theta_t}_1, \dots, {\theta_t}_{13}$

which were drawn identically from the true position $$\mu_t$$ modulo some symmetric error. Assuming Gaussian errors, Newcomb’s model looked something like

\begin{aligned} \hat{\theta_t}_s &= {\theta_t}_s + {\epsilon_t}_s \\ {\theta_t}_s &= {\mu_t} + {\eta_t}_s \end{aligned}

where $${\epsilon_t}_s \sim \text{Normal}(0, \hat{\sigma_t}_s^2)$$ and $${\eta_t}_s \overset{iid}\sim \text{Normal}(0, \tau^2)$$. Here $$\tau^2$$ represents the heterogeneity between the fixed but unknown observatory effects $${\theta_t}_s$$, and $$\text{Normal}(\cdot, \cdot)$$ denotes the Gaussian distribution.

Under this model, Newcomb’s weighted averaging based upon the estimated quality of each observatory’s measurement was precisely the inference procedure dictated by statistical theory developed over a century later. In 1983, Larry V. Hedges showed that the minimum variance unbiased estimator of the position $$\mu$$ in the above random effects model is

$\hat{\mu_t} = \frac{\sum_i w_i \hat{\theta_t}_i}{\sum_i w_i}$

with $$w_i = 1 / (\tau^2 + \hat{\sigma}_i^2)$$ where $$\hat{\sigma}_i^2$$ is the sampling variance within observatories and $$\tau^2$$ is the between-observatory variance. 3

Hence the high standard set by Newcomb’s tables may hardly be ascribed to chance. The level of precision he achieved is in retrospect due in large part to the size and diversity of his data sources and the statistical properties of the weighted estimators he employed in his analysis. The care he took to clean and document his data translated directly into the high level of trust that the field of positional astronomy was to put in his values over eight decades. Finally, and most crucially for the future of the physical theory of gravitation, the precision of Newcomb’s tables provided the most compelling evidence to date for the physical significance of the discrepancy of 43 seconds of arc in the perihelion of Mercury, “setting the stage for Einstein” 4 to make his advances in the early 20th century.

# References

1 S. Newcomb, The Elements of the Four Inner Planets and the Fundamental Constants of Astronomy. United States. Nautical Almanac Office. American ephemeris 1897. Supplement, U.S. Government Printing Office, 1895.

2 A. Hald, A history of mathematical statistics from 1750 to 1930. Wiley series in probability and statistics, New York: Wiley, 1998.

3 L. V. Hedges, Meta-Analysis. Journal of Educational Statistics, vol. 17, no. 4, p. 279, 1992.

4 G. E. Smith, “Lecture notes and slides: Simon Newcomb and the Underlying Logic of Celestial Mechanics.”