Data Science Graduate Students Help Solve Problems That Matter

Example of animal spotted in photo
A screenshot showing two photos where a human eye would have a hard time picking out an animal, but the algorithm predicted there is one there. Zooming in, a chipmunk can be seen on both photos, but a human scrolling through hundreds of photos like this would be very unlikely to catch them, the authors point out.

This summer, The Nature Conservancy (TNC) staff in Massachusetts partnered with data science graduate students at the College of Information and Computer Sciences’ (CICS) Center for Data Science to enlist the power of data science to address the problem of thousands of motor vehicle accidents caused by collisions with wildlife.

As part of the center’s Data Science for the Common Good (DS4CG) summer program, graduate students Vaishnavi Kommaraju and Wonho Baeand and others took on the task of analyzing images captured from motion-triggered cameras placed in forests and along trails. Of the thousands of images captured, only a few hundred contain an animal, a fact that would take a person many hours of reviewing images to glean, Kommaraju says.

The students, mentored by CICS professors Dan Sheldon and Subhransu Maji, used machine learning to train computers to recognize images of animals, and to then develop algorithms that can automatically detect whether an image contains an animal or not, Kommaraju explains.This should save time and resources as TNC ecologists work on mitigation strategies for animal-vehicle collisions.

Kommaraju says she was first interested in this project because of her interest in computer vision and the opportunity to work on a real-world project to help people and animals. “In addition to the technical skills I developed, one of the things I enjoyed the most about being part of Data Science for the Common Good was the opportunity to improve and gain confidence in my public speaking and presentation skills.”

She adds that her student team created an algorithm to identify animals in photographs and tested it against an existing algorithm created by Microsoft. They found that their algorithm worked better on the training data, but the Microsoft algorithm worked better on the TNC data. They contributed some improvements to the Microsoft algorithm and created an online interface tool for TNC to use, she reports.

In another DS4CG summer project, computer science master’s student Nicholas Perello worked as part of a team using data science to help the Springfield Public Schools analyze student information in combination with college enrollment data from a national clearing house. A goal was to answer such questions as why some students succeed in college while others don’t, and what can be done early to identify and help those at risk for dropping out. They also hoped to identify factors that contribute to post-secondary school success, and create predictive models which could identify at-risk students so that interventions can be applied, he explains.

Mentors Tom Walsh of Lowell-based Kronos, a multi-national workforce management software and services company, with senior data scientist Andy Reagan of MassMutual and SPS data analysts guided the students on the project.  

Walsh, the senior director of artificial intelligence and data science at Kronos who donated his time to the work, says, “I was really impressed with the level of engagement from the partner organization, as well as the commitment of the students. Every week the students were churning out new insights. It was a joy to see the creative things they did with the data.” He adds, “We have a responsibility to use the tools we haveand the knowledge we have to make progress in society, especially in areas where these tools may not have been applied previously.”

This team found, among other things, that the college persistence rate, that is the percent of students who earned a college degree or maintained active enrollment towards a degree, was not lower but about the same for low income students or for limited English proficiency students. Also, Springfield graduates who went on to a four-year college were more likely to graduate from college or maintain active enrollment towards a degree than those who went on to a two-year college.

The DS4CG program matches computer science graduate students with a local nonprofit or government agency, to use their data science skills on a mission-driven project over the course of a summer term. In addition to The Nature Conservancy in Massachusetts and Springfield Public Schools, this year’s students also worked with the Charles River Watershed Association, the Greater Holyoke YMCA, the Metropolitan Area Planning Council and the Massachusetts Department of Public Health.