There has been calls to report statistical significance tests in terms of confidence interval rather than the traditional p value.
This blog post by Brandon Rohrer suggests why the use of p-value increase the likelihood of one rejecting a change that should be kept.
We are interested in whether Algo A or B is likely to perform better. p-values don’t actually answer this. p-values focus on whether B performs better than A with high confidence.
Basically, p value is a value representing the probability of the outcome happening. For use in null hypothesis testing.
For example, if we are comparing the difference between A and B. A could be drug A, B could be drug B. Our hypothesis could be A is better than B in say, improving sleep, measured by a sleep quality survey question: How well did you sleep last night after taking the drug. 1 to 10?
If the final p-value came in at 0.08, which is higher than the 0.05 threshold that we all use as the convention for statistical significance. We fail to reject the null hypothesis that A and B are different. In the case of drug, we can't say A was different (or better) than B.
The takeaway message here is that p-values focus on whether A performs better than B with high confidence. But the trap is that maybe A is indeed better, but we had to reject this due to the conventional 5% alpha.
According to Brandon Rohrer's post, Confidence Intervals can help us avoid falling into this trap.
The video below is where you can learn the idea of confidence interval.
Typically, we want to look at 95% confidence interval. With that we can get a better sense of how widely spread each distribution is. Say, the samples who took drug A gave a range of scores, from 1 to 10; but 95% of them closet to each other are between 8 to 9. We can say the 95 confidence interval is 8 to 9 for drug A.
Suppose if the confidence interval of drug B is 7 to 8, though the two confidence interval overlaps. We can see that even the lower bound of A (i.e. 8) is far away from that of B (i.e. 7). This information can lead us to adopt A when we are forced to choose one over the other as A is clearly the better choice.
Nevermind, the p-value.
For a better explanation, definitely learn from the blog post by Brandon Rohrer.
He also talks about why are p-values so sticky?
When analysis is being used to inform decisions our safest path is to set p-values on a high shelf, shove them to the back of the closet, and focus on less misleading analysis tools like confidence intervals.
Hope you learned something about statistics today!