ISSR-CSSI Panel Explores Social Roots of the "Replication Crisis" in Science [1]

The “replication crisis” that is raising questions about the reliability of scientific research has been widely discussed in the fields of psychology and medicine, but has important ramifications for all scientists –social and natural. At a jam-packed April 8 seminar co-hosted by ISSR [2] and the Computational Social Sciences Institute [3], ISSR Assistant Director Henry Renski [4] (Landscape Architecture and Regional Planning) moderated a panel of five scholars from across the Colleges of Information and Computer Sciences [5], Natural Sciences [6], and Social & Behavioral Sciences [7], as they explored key issues, implications, and attempted remedies that this replication debate has raised. The lively discussion that ensued points to a hunger to respond to the epistemological, methodological and institutional questions that underlie the replication debate.

At its core, David Jensen (Computer Science) argued, the replication debate goes beyond the skill or sincerity of individual researchers , and reflects issues in the structure of the contemporary research and publishing enterprise. Citing a foundational 2005 essay by Ionnadis [8] and his own 2000 paper with Cohen [9], Jensen pointed out that, given the variance inherent to any estimates of effect, an enterprise that rewards maximum estimated effects will tend to publish findings with inflated effect size. With all eyes seeking “the next big thing” in research and little reward for publishing non-findings (or findings that the null hypothesis remains a better explanation for the available), current systems reward this bias towards “beating the data until it confesses.” Adrian Staub [10] (Psychological & Brain Sciences) narrated an epic back-and-forth debate that did make it to the pages of a highly cited journal, following his team’s failure to replicate a surprising and significant finding in linguistics. However, he ruefully noted, the original (flawed) paper and the authors’ defense against Staub et al’s critique [11] were both cited more frequently than the critique. “Psychologists,” he quipped, “want to believe new and interesting things, even if a more highly-powered replication has repudiated those findings.” The experience of Thomas Herndon and his colleagues in the UMass Economics Department – in which a course-based exercise in replication of Rogoff & Reinhardt’s agenda-setting paper on budget deficits and GDP growth [12] turned the economic policy world upside -down – showed how important it can be to produce “successful failures to replicate.” And yet the challenges to replication (including access to data, and the low rewards of “non-findings) are formidable.

As prestigious journals try to secure the reputation of the work they publish by requiring authors to post their data and models for public scrutiny, participants in the workshop raised several important concerns that the scientific enterprise must address. First is the risk that this transparency may pose to human subjects whose anonymity may be compromised. This challenge is particularly acute for qualitative scholars, whose “data” are the rich and contextually grounded stories of intimate lives, not easily separable from names and places. A second challenge surrounds the need of early-career researchers to plumb the original data sets they have invested heavily to develop, in order to publish the multiple papers that will secure their professional credentials and careers.

All panelists agreed that the heart of the challenge lies in restoring a broader view of what it means to advance the field – restoring the legitimacy of a null hypothesis as one that requires considerably stronger findings to be overturned, valuing efforts to share and test the merits of new knowledge claims, and reversing the trend toward the narrowest publishable findings possible by valorizing the nuances in understanding that we gain by virtue of failures to replicate “first movers’” findings.

Among the potential remedies, Jensen signaled the merits of “test-of-time” awards [13] for long-standing results, and a cultural shift that rewards rather than penalizes the rapid and transparent replication and revision of scientific literature. Herndon called for academic journals and departments to offer incentives for students and early-career faculty to publish original data – such that their career prospects are helped and not hindered by their contributions to science. Emery Berger [14] (Computer Science) shared an approach being tested in the field of Programming Languages and Software Engineering: a voluntary review of research artifacts (software, models, datasets, etc.) before publication, by an Artifact Evaluation Committee [15] whose findings of completeness and usability would garner the study a merit badge. Caren Rotello [16] (Psychological & Brain Sciences) discussed the limits of various efforts to encourage reproducibility – from the inconclusiveness of the “many labs” approach to the selection bias towards easily replicable studies that mars the Reproducibility Project [17]. While she acknowledged that her recommendation of increasing the power of initial studies carries its own risks of abuse, she emphasized that any such measure would only work to the degree in which the fields commit to new guidance on reasonable ways of producing results.

Check out the speakers' presentation slides, here [18], and stay tuned for further conversation on this important topic !

ISSR Event Summaries [19]