Andrew McCallum

Andrew McCallum

What do health management, energy conservation, and online education have in common? They are all complex issues that benefit from data-informed decision-making, and they are all being brought to the next level by researchers at the University of Massachusetts Center for Data Science.

The Center was established in 2015 by its current director, Distinguished Professor Andrew McCallum, whose pioneering work in machine learning, information extraction, and artificial intelligence has helped steer the center, and the university, to the forefront of data science research and development.

“UMass Amherst has long had tremendous strength in machine learning, artificial intelligence, databases, and other areas related to data science,” says McCallum. “The creation of the Center for Data Science is helping UMass build on that strength, increase our impact, and gain visibility in this increasingly high-demand area.” What that means in real terms is developing research that’s among the most cited in the field, creating software that runs in Fortune 500 companies, and vastly increasing production of graduates exceptionally trained in turning big data into real-world breakthroughs.

“The world’s highest priority challenges––in food, water, energy, medicine, economics, education, and social balance––are increasingly complex, and will require sophisticated decision-making,” explains McCallum. “The basis for informed decisions is the underlying data. Building tools for wading through the vast quantities of available data in order to extract actionable insight —that’s the focus of data science.”

In other words, from biotech to telecom to transportation and government, virtually every domain is now reliant on data scientists to find meaning in masses of information—as in current center-affiliated projects to predict dengue fever epidemics, determine infrastructure investment, and develop personalized tutoring systems for disadvantaged students. Demand for trained data scientists, along with a huge upsurge in student interest, inspired McCallum to create the center. “We believed that it would be beneficial to the university to have the infrastructure to support and broadcast our strengths,” he says.

McCallum’s own strengths make him a natural fit to direct the center. Considered a preeminent researcher in natural language processing, McCallum has more than 300 publications to his name and over 60,000 citations from fellow scientists, making him the second-most-cited researcher in the field internationally. His work, and his commitment to creating high-impact opportunities for his students, led to his appointment as Distinguished Professor earlier this year.

Opportunities have been plentiful, thanks to McCallum’s dedication to outreach, paired with what he calls a “tremendous appetite in industry” for research in data science. That’s evidenced by the over $25 million in support that the center’s projects have received to date, from industry, government, and philanthropic and research organizations such as MassMutual, the National Science Foundation, Google, IBM and the Chan Zuckerberg Initiative (CZI).

That speaks to McCallum’s vision for the center as an active, project-based collaboration between global partners committed to understanding needs and discovering solutions. One of the center’s largest projects is a collaboration with CZI to enable faster medical breakthroughs by creating an AI-driven, navigable map of scientific knowledge based on millions of biomedical research papers. “With 4,000 new papers published every day, it’s difficult for scientists to stay abreast in their own fields, much less other fields,” says McCallum. Because knowledge bases are currently incomplete and disconnected, critical patterns and discoveries may go unnoticed—a major roadblock to progress when, as McCallum says, “cross-field connections are the most fruitful arenas for breakthroughs.”

Supporting scientific advancement is also at the heart of a project McCallum leads in an area he says is “ripe for a revolution”: scientific peer review. “Current peer review practices were designed in the days of on-paper publishing and postal service delivery of reviewing assignments. The scientific community is increasingly questioning the efficacy and fairness of the current traditional workflow, which can be slow, insular, non-interactive, and not necessarily conducive to the promotion of scientific creativity,” says McCallum. To change this, his team at UMass created a platform and website ( that enables conference and journal publishers to flexibly experiment with judicious widening of “who gets to see what when” as well as more interactive communication among reviewers, authors and the larger research community. The result, says McCallum, is a more open and collaborative review process that current users have already said is leading to higher quality reviewing and more collaboration.

McCallum’s commitment to collaboration extends far beyond the data science and general science community—he’s reaching out to share resources and knowledge with learners in every field. “In an era where computers, data availability, and computer networking affect every corner of our lives, it’s important that every undergraduate emerges with some knowledge of computational thinking,” he says. “We want to create introductory general education courses for students of all majors, including people with no programming experience.” UMass students already benefit from resources such as the largest academic cluster of GPUs (graphic programming units) in the world, which are crucial for work in deep neural networks, a sub-field of machine learning that is revolutionizing artificial intelligence.

McCallum credits the uniquely supportive atmosphere at UMass for much of this advancement. “There’s an incredible sense of collaboration, generosity, and I would even say love among our faculty,” he says, recalling a meeting early in his UMass career at which, to his surprise, his fellow faculty advocated for new hires based not on their personal field of interest but on the benefits to the community at large. “I left that meeting walking on a cloud,” he recalls. “Those kinds of open and generous collaborations not only make us better people—they also advance academic and scientific progress.”

Ellen Keelan

Meet more Spotlight Scholars.