How Selective Reporting Corrupts Scientific Results at Two Levels
We called the fishing trip a hypothesis test.
The replication crisis stems not from bad actors but from a predictable collision between two layers of selective reporting — choosing which studies to publish and which analyses to run — compounded by scientists never being taught the distinction between exploration and confirmation.
The Translation
AI-assisted summaryFamiliar terms
The Replication Crisis is most productively understood as the predictable consequence of Selective Reporting operating at two distinct levels. Study-level selection — the file-drawer problem — is well recognized: researchers conduct many studies but disproportionately publish those yielding positive results, inflating the literature's apparent effect sizes. The second level, analysis-level selection, is subtler and arguably more corrosive. Within a single dataset, the data themselves begin to shape which comparisons get tested, which covariates get included, and which subgroups get examined. This iterative, data-informed process is cognitively natural and genuinely useful for hypothesis generation, but it is catastrophic for inferential validity when its outputs are presented as confirmatory. High-dimensional datasets contain spurious correlations by mathematical necessity; unconstrained analytic flexibility guarantees their discovery.
Pre-Registration addresses this problem not by privileging confirmatory over exploratory work, but by making the boundary between them legible. It is a labeling device, not a hierarchy. Readers can then calibrate their evidential confidence accordingly — treating pre-registered confirmations as stronger evidence and exploratory findings as promising leads requiring independent replication.
What makes this situation genuinely tragic is that the exploration-confirmation distinction is elementary — it belongs in any introductory course on experimental design. Yet training programs across the life sciences routinely lack formal coursework in methodology and statistical reasoning. Scientists absorb their inferential habits through apprenticeship in labs that may themselves perpetuate deep misunderstandings about what p-values represent, what the .05 threshold actually controls for, and how researcher degrees of freedom interact with nominal error rates.