On Friday, October 20, a crowded house of social scientists, computer scientists, and planners gathered in the new ISSR lab to discuss insights emerging from a National Science Foundation funded project on the social sciences and big data. Leading the dialogue were the project’s principal investigators, who have each been program directors at the National Science Foundation: Susan Sterett, Director of the School of Public Policy and Professor of Political Science at the University of Maryland, Baltimore County and Kelly Joyce, Director of the Science, Technology & Society Center and Professor of Sociology at Drexel University.
Drawing on the project’s work with researchers, data technicians, and city and federal government officials, the presentation and ensuing exchange grounded the optimism surrounding big data analytics in the hard-won lessons of social science, and underlined the importance of collaboration among social and computer scientists from the earliest stages of big data research design. These lessons highlighted the practical challenges of converting messy social phenomena into clean data, the political dynamics shaping whose data, questions, and analysis drive the big data train, the empirical questions surrounding the unacknowledged labor that big data requires, and the ethical issues triggered by its use for policy analysis and decision-making.
Examples of data analytics projects done well, and gone wrong, brought a dose of realism to the image of abstract and objective mathematics that big data so often conveys, and show its deep dependence on the real people, places, and power relations that are the purview of social science. Against the idea that “data wants to be free,” the presenters shared numerous examples where data is emphatically not free, or at least not equally so. Participants considered—for example—the struggle to obtain valid information on pupils, workers, and infrastructure in Arlington County, VA where the high concentration of federal intelligence and defense workers leaves big holes in official data, and human creativity must be recruited to extrapolate counts from distant proxies. City officials labor for up to a year to produce the resulting data, a luxury many small and mid-sized cities can’t afford. Off-the-shelf data sets or commercial services offer time- and resource-strapped users a way into the data revolution, but little ability to explore the assumptions and biases that these might contain. Joyce and Sterett asked, “What does it mean to govern through open data? What are the work practices involved in translating records into data? What gets asked? What gets bracketed?”
More automated data analytics—where subject data is pulled from the cloud as people go about their lives—is no less subject to questions of validity, equity, and ethics. Even anonymized data can be easily re-identified when big datasets are merged, and people lose their “right to be forgotten” when data can be redeployed long after its fitness for ethical use. Microsoft Research analysis of the Cambridge MA pilot of “Street Bump,” an application where residents reported potholes from their cellphones found that the effect was like “inequality on steroids.” Because app use was highly correlated with existing class and cultural hierarchies, neighborhoods populated by younger and wealthier residents drew the lion’s share of street repairs; a bias quickly fixed when the social scientists shared their findings with the developers, who installed the app on city buses and garbage trucks. This example raised the questions about how developers calibrate new technologies by comparing with conventional “street-level” sources, and also how power operates when numbers and algorithms stand between the analyst and the subject. “Power doesn’t go out of the room just because data is in it,” Joyce affirmed.
An important question that faces a research university like UMass is how to equip students and researchers with the critical lens that is missing from so many of the “boot camp” programs that promise to position students to reap the job the new big data economy. The conversation suggested the work that lies ahead at a theoretical level to explore big data’s ontological and epistemological questions, as well as these practical and ethical concerns. On the most fundamental level of inquiry, the cases explored raise the question of what is being produced in the translational data sciences, and how the translation process mystifies underlying social relations. In line with other work on quantification in society, the presenters asked why we expect data to surprise us in ways that people can not, and to be more reliable than people’s perceptions and experiences of their realities –what social beliefs and biases underlie the trust we place in numbers and experts instead of the people who live and work in these spaces?
This dialogue was co-sponsored by the ISSR and the Computational Social Sciences Institute, with funding from the National Science Foundation award #1623445, and marks the second conversation on this important project at UMass. It builds on a previous workshop—Advancing Ethics for Trustworthy Cyberspace and Data Analytics—that was organized by Joyce and Sterett and was held on September 29-30, 2016 in Arlington, VA. Joyce and Sterett are working on a report that summarizes findings from the workshop. The report will be freely available to the public.