Students from the Five Colleges will compete to see who can best analyze big data and attract the interest of employers at the 2015 ASA DataFest competition—a collaboration between academe, students and industry that will be held March 27-29 in 1634 Lederle Graduate Research Center.
DataFest is an annual competition in which teams of up to five undergraduates work to reveal insights from a large and rich data set. The program takes data-analysis learning beyond the constraints normally encountered in a typical statistics course by enabling the students to work with big data provided by a real client.
The graduate student organization Graduate Researchers interested in Data (GRiD), a group whose focus is on data science, will help to coordinate the contest, and faculty coordinators Nick Reich, Jing Qian, Raji Balasubramanian and Matthias Steinruecken, all from the department of biostatistics and epidemiology, will serve as consultants for the event. UMass Amherst teams scheduled to participate in the event consist of students representing majors in biochemistry, biostatistics, computer science, economics, engineering, mathematics, physics, public health and statistics.
During the 48-hour event that begins Friday evening and concludes Sunday afternoon, teams from Amherst, Hampshire, Mount Holyoke and Smith colleges and the university will compete head-to-head with all other teams for prizes in categories ranging from “Best Insight,” “Best Visualization” and “Best Use of External Data.” Each team presents its findings to a panel of judges—composed of professors, data scientists and industry representatives. The student competitors will also be trying to catch the attention of industry recruiters who will be attending the event to offer advice to the competitors and identify the students with the best quantitative and analytical skills for potential job opportunities.
“While many participants enjoy DataFest as a friendly competitive event, it is much more to students nearing graduation and the company reps in attendance who are seeking to recruit new statistical talent,” says Robert Gould, professor of statistics at the University of California, Los Angeles, and national organizer for ASA DataFest. “In the relatively short history of DataFest, numerous students showcased their statistical skill during the event and simultaneously developed contacts with employers that have led to offers of full-time employment. Students who do well at DataFest are students who have proven that they can navigate the ‘data deluge.’ And this is very attractive to potential employers.”
Each year, the data and the challenge are different, but the common theme of making sense of big data—larger and more complex than the data sets undergraduates usually encounter in a classroom—is carried over. The data set, which consists of real-world data that is of current interest to the providing organization or business, is not unveiled until the start of the competition so participating students cannot prepare in advance for the event. For the first DataFest in 2011, the data set consisted of nearly 10 million arrest records spanning a five-year period that was provided by the Los Angeles Police Department. In 2013 the data, which consisted of 1 million user-candidate pairs with more than 200 variables, was provided by popular online dating service eHarmony.com. The 2014 dataset consisted of building energy data from GridPoint. The data challenged DataFest competitors to find patterns that would help a business decide to implement energy-saving steps. This year, organizers have another large data set that will challenge 2015 DataFest competitors.
DataFest was first held by the statistics department at UCLA in 2011 and has since expanded nationally. This is the second year of the ASA Five College DataFest.