March 20, 2015
The UMass Amherst graduate student organization GRiD (“Graduate Researchers interested in Data”), a group whose focus is on data science, will help to coordinate the contest, and UMass Amherst faculty coordinators Nick Reich, Jing Qian, Raji Balasubramanian, and Matthias Steinruecken, all from the Department of Biostatistics and Epidemiology, will serve as consultants for the event. UMass Amherst teams scheduled to participate in the event consist of students representing majors in Biochemistry, Biostatistics, Computer Science, Economics, Engineering, Mathematics, Physics, Public Health, and Statistics.
During the 48-hour event that begins Friday evening and concludes Sunday afternoon, each team from Amherst, Hampshire, Mount Holyoke, and Smith Colleges as well as the University of Massachusetts Amherst competes head-to-head with all other teams for prizes in categories ranging from “Best Insight,” “Best Visualization” and “Best Use of External Data.” Each team presents its findings to a panel of judges—comprised of professors, data scientists, and industry representatives. Perhaps just as important, the student-competitors will be trying to catch the attention of industry recruiters who will be attending the event to offer advice to the competitors and identify the students with the best quantitative and analytical skills for potential job opportunities.
“While many participants enjoy DataFest as a friendly competitive event, it is much more to students nearing graduation and the company reps in attendance who are seeking to recruit new statistical talent,” says Robert Gould, Professor of Statistics at UCLA and national organizer for ASA DataFest. “In the relatively short history of DataFest, numerous students showcased their statistical skill during the event and simultaneously developed contacts with employers that have led to offers of full-time employment. Students who do well at DataFest are students who have proven that they can navigate the 'data deluge'. And this is very attractive to potential employers."
Each year, the data and the challenge are different, but the common theme of making sense of big data—larger and more complex than the data sets undergraduate students usually encounter in a classroom—is carried over. The data set, which consists of real-world data that is of current interest to the providing organization or business, is not unveiled until the start of the competition so participating students cannot prepare in advance for the event. For the first DataFest in 2011, the data set consisted of nearly 10 million arrest records spanning a five-year period that was provided by the Los Angeles Police Department. In 2013 the data, which consisted of 1 million user-candidate pairs with more than 200 variables, was provided by popular online dating service eHarmony.com. The 2014 dataset consisted of building energy data from GridPoint. The data challenged DataFest competitors to find patterns that would help a business decide to implement energy-saving steps. This year, organizers have another large data set that will challenge 2015 DataFest competitors.
DataFest was first held by the Statistics Department at the University of California, Los Angeles (UCLA) in 2011 and has since expanded nationally. This is the second year of the ASA Five College DataFest.
For more information, contact Five College DataFest coordinators Ben Baumer (email@example.com, 413-585-3440) or Andrew Bray (firstname.lastname@example.org, 413-538-2341). More information is available at http://www.science.smith.edu/departments/math/datafest/ and http://magazine.amstat.org/blog/2014/06/01/datafest/.