In the final chapter of his famous book How to Lie with Statistics, Darrell Huff tells us that “anything smacking of the medical profession” or published by scientific laboratories and universities is worthy of our trust – not unconditional trust, but certainly more trust than we’d afford the media or shifty politicians. After all, Huff filled an entire book with the misleading statistical trickery used in politics and the media, but few people complain about statistics done by trained professional scientists. Scientists seek understanding, not ammunition to use against political opponents.
Statistical data analysis is fundamental to science. Open a random page in your favorite medical journal and you’ll be deluged with statistics: t tests, p values, proportional hazards models, risk ratios, logistic regressions, least-squares fits, and confidence intervals. Statisticians have provided scientists with tools of enormous power to find order and meaning in the most complex of datasets, and scientists have embraced them with glee.
They have not, however, embraced statistics education, and many undergraduate programs in the sciences require no statistical training whatsoever.
Since the 1980s, researchers have described numerous statistical fallacies and misconceptions in the popular peer-reviewed scientific literature, and have found that many scientific papers – perhaps more than half – fall prey to these errors. Inadequate statistical power renders many studies incapable of finding what they’re looking for; multiple comparisons and misinterpreted p values cause numerous false positives; flexible data analysis makes it easy to find a correlation where none exists. The problem isn’t fraud but poor statistical education – poor enough that some scientists conclude that most published research findings are probably false.31
What follows is a list of the more egregious statistical fallacies regularly committed in the name of science. It assumes no knowledge of statistical methods, since many scientists receive no formal statistical training. And be warned: once you learn the fallacies, you will see them everywhere. Don’t be alarmed. This isn’t an excuse to reject all modern science and return to bloodletting and leeches – it’s a call to improve the science we rely on.
Updated January 2013 with a relevant example of the base-rate fallacy: survey estimates of gun usage.
Updated April 2013 with more details on the interaction of truth inflation and early stopping rules, researcher freedom in neuroscience, poor statistical power in neuroscience, how to control the false discovery rate, publication bias and poor reporting, underpowered studies and right turn on red, the misuses of confidence intervals, the impact of all these errors, what can be done to save statistics, and additional references and details in many other places.
I’ve tried my best, but inevitably this guide will contain errors and omissions. If you spot an error, have a question, or know a common fallacy I’ve missed, email me at alex at refsmmat dot com.
Thanks to Dr. James Scott, whose statistics course gave me the background necessary to write this; to Matthew Watson and CharonY, who gave invaluable feedback and suggestions as I wrote my drafts; to my parents, who gave suggestions and feedback; to Dr. Brent Iverson, whose seminar first motivated me to learn about statistical abuse; and to all the scientists and statisticians who have broken the rules and given me a reason to write.
Any errors in explanations are my own.
This work is licensed under a Creative Commons Attribution 3.0 Unported License. You’re free to print it, copy it, translate it, rewrite it, set it to music, slice it, dice it, or whatever, so long as you attribute the original to me, Alex Reinhart, and provide a link back to this site. (If you do translate it, please let me know! I’d happily provide a link to your translation.) Hit the link to the license for more details.
The xkcd cartoon used inside is available under the Creative Commons Attribution-NonCommercial 2.5 License, and may not be used commercially without permission from the author. More details.