UMass Amherst Hosts Big Data Competition for Five College Students

UMass Amherst participants from the
2016 ASA DataFest competition

March 22, 2017

Thirty teams of students from the Five Colleges are expected to gather on the University of Massachusetts Amherst campus from Friday evening to Sunday afternoon, March 31 to April 2, to compete in analyzing large sets of complex data at the 2017 ASA Five College DataFest competition, sponsored nationally by the American Statistical Association (ASA).

Google is also sponsoring DataFest at the national level for the second year, organizers say, and MassMutual Life Insurance Company is supporting the Five College event.  

UMass Amherst organizer and Assistant Professor of Biostatistics Nicholas Reich says the 48-hour event, one of several across the nation, may attract as many as 150 students. It is not only an exciting competition with prizes for winning teams, it also provides “a terrific opportunity for students to develop their ability to tell stories with big data,” he says. “We’ve heard from students that DataFest provides them with one of the best and most interview-ready learning experiences they have as an undergraduate.”

At a DataFest, undergraduate students do the work, assisted by roving graduate students, faculty, industry professionals and representatives of the organization that poses the challenge and provides the data. This is the fourth annual Five College event. Each year the data and challenge are different, but the common theme is making sense of larger and more complex data sets than undergraduate students usually encounter in a classroom, he adds.

The data is real and of current interest to the organization or business that provides it. For example, for the first DataFest atthe University of California, Los Angeles in 2011, 30 students gathered for 48 hours to analyze five years of arrest records provided by the Los Angeles Police Department.

DataFest topics are not unveiled until the last minute so students cannot prepare in advance, but other past topics have involved analyzing online sales for TicketMaster, national electricity use patterns and data from the online dating service

Ben Baumer, local organizer and Assistant Professor of Statistical and Data Sciences at Smith College, says participating students will learn data analysis skills beyond those normally encountered in a typical statistical science course by allowing them to work with big data from a real company. “The sponsors are truly interested in what the students come up with. These undergraduates will propose creative, useful and intelligent new approaches to problems that no one else has tried before,” he adds.

On Sunday afternoon, students will present their finding in parallel sessions of lightning five-minute talks, Reich says. A panel of professors and data scientists will award prizes for “Best Overall,” “Best Insight,” “Best Visualization” and “Best Use of External Data.”

Organizers say past participating students report that DataFest taught them not only about manipulating large datasets and creating effective visualizations but also about how open-ended statistics can be. Others say working in a team teaches how to integrate one’s skill and creativity.

Statistician Rob Gould of UCLA, a founding organizer of the ASA DataFest program, says, “While many participants enjoy DataFest as a friendly competitive event, it means much more to students nearing graduation and the company reps in attendance who are seeking to recruit new statistical talent. In the relatively short history of DataFest, numerous students showcased their statistical skill during the event and simultaneously developed contacts with employers that have led to offers of full-time employment. Students who do well at DataFest are students who have proven that they can navigate the ‘data deluge.’ And this is very attractive to potential employers.”

Local organizations and businesses are invited to support, sponsor or participate in DataFest by contacting Baumer at 413/585-3440or