February 12, 2026 1:00 pm - 2:00 pm ET
Statistics and Data Science Seminar Series
LGRT 1685

Title: Data thinning (and beyond) to avoid double dipping

Abstract: While classical statistical methods are designed for testing hypotheses about
pre-specified models, the reality of modern science is that analysts often explore their
data before coming up with models and hypotheses of interest. We refer to the practice
of using the same data to generate and then test a hypothesis, or to fit and then
evaluate a model, as double dipping. Problems arise when standard statistical
procedures are applied in settings that involve double dipping. Often, we avoid double
dipping by splitting our observations into a training set and a test set. While this sample
splitting approach is straightforward and easy to understand, it is generally unapplicable
in unsupervised settings. Motivated by unsupervised problems that arise in the analysis
of single-cell RNA sequencing data, we propose data thinning, an alternative to sample
splitting that splits each observation in a dataset into two independent pieces. We show
that this method provides an elegant solution to our motivating problems under
distributional assumptions, and discuss extensions that can be used when those
assumptions are not met.


Most relevant papers: The two most relevant papers for this talk are
https://academic.oup.com/biostatistics/article/25/1/270/6893953 (focuses more on the
application) and https://www.jmlr.org/papers/volume25/23-0446/23-0446.pdf (a
general introduction to data thinning), but I will also touch on material from other recent
or forthcoming papers (e.g.
https://www.tandfonline.com/doi/full/10.1080/01621459.2024.2421998).


Short bio: Anna Neufeld is an Assistant Professor of Statistics at Williams College, where her
research focuses on selective inference and the analysis of genomic data. She received
her PhD in Statistics from the University of Washington in 2023 and subsequently
completed postdoctoral training at the Fred Hutchinson Cancer Center, before joining
the faculty at Williams in 2024.