Significance testing of sample data is widely used to determine whether a hypothesis is valid and applicable to a larger population. However, as discussed by Valentin Amrhein in an article posted on PeerJ Blog, inferential statistics of this kind can be seriously flawed, largely due to the incorrect use of the p value.
While a small p value may indicate that the null hypothesis is false, as is often reported, it might also arise from an incorrect mathematical model or from the selection of specific analyses. Further, p values are based on sample averages and will therefore vary with sample-to-sample variation. In fact, it has been argued that this is a major cause of irreproducible scientific results. This does not mean that the p value itself is unreliable, it simply reflects variation in sample data. Amrhein also explained how confidence intervals are misused as significance tests by many researchers. These should be used to give an idea of how large the degree of uncertainly could be in the ideal scenario if all assumptions are correct, and Amrhein suggests that ‘uncertainty intervals’ or ‘compatibility intervals’ would be a more appropriate term to use.
Amrhein is not suggesting that statistics are abandoned and, on the contrary, states that “in many fields of research, there is almost no science without statistics”. However, he stresses that personal judgement must be used alongside statistics, and advises against the isolated use of significance thresholds to determine whether a hypothesis is true or false and whether data are worthy of publication. Basing such decisions on p values alone, Amrhein warns, could be “extremely harmful”, as discussed further in the full research paper.
What are your thoughts? Add your comments below.