Scientists rise up against statistical significance

The latest issue of Nature has an interesting opinion piece regarding our incorrect tendencies to view statistically significant and non-significant results as two binary and mutually exclusive categories.

Scientists rise up against statistical significance

Not sure if this is paywalled or not. Here’s an excerpt:

Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.
For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% ( P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk ( P = 0.0003; our calculation).

It is ludicrous to conclude that the statistically non-significant results showed “no association”, when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us (see ‘Beware false conclusions’).


Thanks for the link @davecarlson . This would be an excellent example to use in my analytical chemistry course. We do a couple weeks on confidence intervals, hypothesis testing, and quality control. Often students calculate “things” (confidence intervals, etc.) without actually stopping to think about what they mean.



It’s good to see this being discussed.

The problem arises because the people who do statistical testing often do it as a mechanical procedure to follow. It would be better if they tried to understand the underlying mathematical probability theory.