July 23, 2018
Assistant Professor of Biostatistics Laura Balzer describes a machine learning method with implications for epidemiological research in a recent publication titled “Stacked generalization: an introduction to super learning.” Co-authored with Ashley Naimi of the University of Pittsburgh, the article provides a practical introduction to ensemble methods that allow researchers to combine several different prediction algorithms into one.
“The approach has tremendous potential to improve our ability to identify individuals or groups as high risk as well as improve the robustness of our associational and effect analyses,” says Balzer, who plans to teach these methods in both graduate and undergraduate courses this fall.
Although ensemble methods have existed since the early 1990s, their use in Public Health has been limited by the lack of pedagogic materials demonstrating their implementation. The authors provide step-by-step instructions through two examples and accompanying R code to illustrate concepts and address common concerns, such as the actual process by which the resulting predictions are combined.
“Stacked generalizations, notably Super Learner, are fast becoming an important part of the epidemiologic toolkit,” the authors write. “Overall, Super Learner is an important tool that researchers can use to improve predictive accuracy, avoid overfitting, and minimize parametric assumptions. We have provided a simple explanation of Super Learner to facilitate a more widespread use in epidemiology. More advanced treatments with realistic data examples are available and should be consulted for additional depth.”
The article appears in a recent issue of the European Journal of Epidemiology.