Thereâ€™s a lot here about the proton radius saga which isnâ€™t captured in the article. The energy levels of hydrogen can be calculated from a closed-form analytic formula, which a function of the proton radius r_p. Thus by performing spectroscopy on different transition levels of hydrogen, one can (by inverting the formula) repeatedly measure r_p and compare the results. One can similarly do this for muonic hydrogen. This plot (from a review by Pohl et al.) captures the essence of the puzzle:

The blue dots are independent measurements of r_p (with 2 sigma error bars) taken since the 1990s by using different transition frequencies of hydrogen. The vertical blue line is the average of all these past measurements. The vertical red line is the very precise measurement from muonic hydrogen (which started the whole puzzle). As you can see, the muonic hydrogen value doesnâ€™t deeply disagree with any of the blue values if you take them one by one, as the blue values have large uncertainties. Itâ€™s only when you take the blue values together and compare them to muonic hydrogen that you get the huge discrepancy. Is it statistically legitimate to average together the blue values together in that way? This was never clear to me (or several others that I talked to).

The proton radius puzzle continued with a different experiment of Pohlâ€™s group (2016), this time using muonic deuterium. The measured of value of r_p was closer to the newer, smaller muonic hydrogen value. Even more surprising was a measurement (2017) by the Haensch-Udem group using the â€śtraditionalâ€ť methods of plain old hydrogen spectroscopy - the same method as the blue dots in the graph above. The result found was again consistent with the smaller value. The latest result in the article from York University is a measurement of r_p using yet a different method, and it is found to agree again with the newer, smaller value.

Despite the agreement on the smaller value for r_p using multiple methods published in the last 5 years, itâ€™s unclear whether one can really say that the proton radius puzzle has been â€śresolvedâ€ť other than sociologically. To this day no one knows what was wrong with the older hydrogen spectroscopy measurements which gave the larger values - perhaps some mysterious systematic error? Still, people seem more comfortable to trust recent results using state-of-the-art methods and modern standards of characterizing and reporting systematic errors. There is some sort of â€śrecency biasâ€ť at work. This might be why the news article talks about the proton radius being â€śresolvedâ€ť with the new Lamb shift result.

Still, there is *one* contemporary (2019) measurement of r_p which agrees with the older, larger value, performed by a different group in France. As far as I know, nobody has also been able to explain why this result, which disagrees with the two Pohl results, the Udem result, and the recent Hessels result, is wrong.

To me, the whole episode serves to show that even in the very meticulous field of precision measurement, there can be puzzling discrepancies in results, even if the measurements were all performed by highly reputable, experienced groups. There are limitations in our abilities to characterize and look for systematic errors. Thus, a single (or even a few) aberrant measurements are insufficient to convince the field that there is some new revolutionary physics going on. This is important to keep in mind especially in threads where people like @stcordova have questioned special relativity on the basis of a few aberrant experimental results.

It is legitimate, but it embeds certain assumptions, which may not be valid. As simple average presumes equal variance (false), equal importance (false?), uncorrelated errors (false?), and unskewed errors (false?).

Looking at particular dataset. I make a few observations.

- Higher values tend to have higher variances, which means that variances controlled average would be lower than simple average.

2.There is a clustering around lower values, but then some outliers that are higher. This can be a sign that this would be better behaived as a log (though not always).

This suggest the data might be better modeled as a normal distrubition of the data in log transformed space. Averaging, taking the difference in variance into account might produce a much better fit to the red line.

To verify this some of this:

Getting a look at this formula and the error distribution of the observations would be really helpful. Can you show me the formula that tells us how to convert from energy gap to proton size? Do you know the error distribution on the observations? Is it normally distributed or something else?

Maybe it could be cleared up with some sharper statistics. Would that be interesting?

Most certainly these are so far abberant experimental results, and it is my hope more experiments will be done. Iâ€™ve tried to do 2 of them myself, one experiment I could not reconstruct because the necessary parts were discontinued by the manufacturer, and the other gave inconclusive results as I couldnâ€™t stabalize the interferometer signal. So, for now I have to view the claims with deep skepticismâ€¦

On a related note, all the experiments purported to measure absolute motion, most certainly disagree with the Galilean transformation and hence it is understandable that Michelson-Morely and other experiments reported a null result. It is null with respect to the Galilean transformation, and even the neo-Lorentzians will agree the Galilean transformation doesnâ€™t apply to light.

However, the irony is the Special Relativity uses the Lorentz transformation, and I was always curious why the special theory of relativity was associated with Einstein since the transformation equation was named after Lortentz!

Anyway, there were several experiments that indicated a possible variation form Einstein SRT, and most certainly were too small to support the Galilean transformation, but large enough to support a neo-Lorentzian view.

In any case it would be good to finally resolve the cause of the discrepancies. Dayton Miller was certainly a top experimentalist and he was not willing to confirm the Einsteinian view of SRT. This was description of Millerâ€™s apparatus:

The small shifts observed by Miller have been seen in other experiments since, and when not viewed through the lens of Galilean relativity, but rather neo-Lorentzian relativity, the inferred velocities of Earthâ€™s movement appear consistent with the predicted orbital speed of the Earth around the sun and the Earthâ€™s rotation about its axis if the transformation of velocities is done via a neo-Lorentzian approach.

Finally, it just seems personally elegant to me for there to exist an absolute reference frame for all physics, hence the appeal of alternative formulations of relativity to that effect.

Iâ€™m pretty sure that the blue vertical line is a weighted average. I think many of these old hydrogen spectroscopy results were done by the same lab with similar procedures (possibly different apparatus). It is possible that there is an unknown systematic error that skews them towards larger value for r_p.But no one has been able to pinpoint what that was.

The basic form of the formula is quoted in the 2017 Udem paper I referenced above:

Here R_{\infty} is the Rydberg constant (measured to very high precision by a different experiment), n, l, j are principal, orbital, and total quantum numbers, C_{NS} is a constant. f_{nlj} is an incredibly complicated function involving multiple corrections from SR and QED (which are listed in detail in section 4 of this review). Unpacking this function results in a formula is incredibly complicated and runs to several pages. I havenâ€™t been able to find a rendering in its full glory online. Because of its complexity, I donâ€™t think you can easily predict its character.

The error bar is a sum (in quadrature) of the systematic and statistical uncertainty. The statistical uncertainty is usually assumed to be Gaussian. Determination of the systematic uncertainty usually involves coming up with models to characterize the various systematic shifts to the measured value of the transition frequency, some of which may not be Gaussian models. However, the uncertainty in these systematic shifts are also usually assumed to be Gaussian.

Thatâ€™s basically what the recent measurements are supposed to do. They are more precise and point to the smaller value, indicating that weâ€™ve been wrong about the r_p (by about 5%) for the last several decades. This doesnâ€™t explain what was wrong in those experiments, though.

In all likelihood, the variations were several orders of magnitude too small to get us from a universe created 13.8bya to a universe that is 7kya.

@stcordova I invite you to present your detailed work if you care to disagree with my characterization. Without the detailed work, your constant refrain that there is some small amount of variation, possibly it is not due to measurement error, therefore the universe could be 7kya seems like an enormous and completely unjustified leap.

Best,

Chris

the blue band is slightly larger on the high side, which indicates to me that some form of log-transform or link-function mentioned by @swamidass may have been used here.

If I had this sort of data in hand, I would apply some methods from Meta-analysis to look for bias, and used a mixed effects model to obtain the average.