Protecting Privacy While Avoiding Unfairness in Data-Based Decisions Such as U.S. Census

UMass Amherst data scientists, team offer ways to avoid harm to minority communities
Gerome Miklau
Gerome Miklau

AMHERST, Mass. – In a new investigation into how applying common privacy algorithms can cause inequities for minority populations and other groups, a team led by Gerome Miklau at the University of Massachusetts Amherst’s College of Information and Computer Sciences (CICS) explored such potential effects on results from data of the U.S. Census Bureau and other institutions.

In what they believe is the first study of its kind, the authors describe how they measured the impact of privacy algorithms on the fairness of subsequent policy choices concerning such matters as voting rights, funds allocation and political apportionment.

Miklau says, “Census data in particular is used to make hundreds of really important decisions, including things like Title I educational funding, minority language voting rights and political apportionment, all of which rely on counting people as accurately as possible.”

This means systems must effectively balance the need for privacy with the necessity for results to show correct representation of minority groups, or risk creating decision-making failures that cause societal inequities, Miklau explains. “We wanted to know if any groups are disparately affected because of privacy protections,” he adds. “For example, what is the likelihood that a school district may improperly lose funding because of the privacy protection methods?”

The privacy mechanisms they studied use a model known as “differential privacy,” currently planned for use by the U.S. Census Bureau in analyzing the 2020 census. By systematically introducing “noise,” or errors, into any results that are released, differential privacy provides individuals a guarantee that their participation in a survey cannot be extracted from results obtained from that survey.

Miklau and colleagues report finding that the introduction of this noise could lead to downstream decisions that would unfairly impact certain groups. In one example, a community which has significantly higher Spanish-language representation than its immediate neighboring communities could fail to qualify for a multiple-language ballot if its representation counts were partially evened out with the communities around it.

To remedy such effects, the researchers propose new techniques for ensuring fairness in decisions made from these data, including repair mechanisms to fix inequities in differentially private results and new allocation methods for funding and benefits to ensure that noisy results won’t hurt minority communities.

Miklau and his doctoral advisee Ryan McKenna, with researchers Michael Hay at Colgate University, Ashwin Machanavajjhala and doctoral candidate David Pujol at Duke University, presented their paper, “Fair Decision Making Using Privacy-Protected Data” at this year’s ACM FAT 2020 conference in Barcelona, Spain.

“Using these methods, disparate effects on data-driven policy decisions can be effectively mitigated,” Miklau says. “We can have journalists and social scientists looking for the impacts and calling on the census for better methods for remedies. It’s taken years to move from theory to practice, but we’re getting there.”

“Differential privacy is a proven model for protecting the privacy of individuals, which will allow for more transparency in the tabulations released publicly by the U.S. Census,” Miklau says. “But while protecting privacy, we must watch for disparate effects on downstream data-driven policy decisions and develop techniques to mitigate them.”

“Using the right methods, we can simultaneously protect the public from invasions of privacy and distortions that would result in unfair losses of service.”