You are here

The Mathematics of Privacy and Synthetic Data

Event Category:
Mathematics of Machine Learning
Thomas Strohmer
University of California, Davis

'Sharing is Caring', we are taught. However, in the Age of Surveillance Capitalism we better think twice what we share. As data sharing is increasingly locking horns with data-privacy concerns, synthetic data are gaining traction as a potential solution to the aporetic conflict between privacy and utility. The goal of synthetic data is to preserve meaningful statistical information about the dataset, but without risk of exposing private information. Synthetic data are expected to have great potential in areas such as health care, where patient data are protected by privacy laws. But can we even construct synthetic data that are simultaneously private and accurate? And what do privacy and accuracy actually mean in this context? Trying to answer these questions leads to deep mathematical challenges, as the road to privacy is paved with NP-hard problems! I will introduce various mathematical concepts of privacy and utility and discuss associated privacy-utility tradeoffs. I will then present some of our recent breakthroughs in the NP-hard challenge of the computationally efficient creation of synthetic data that come with provable privacy and utility guarantees. This is joint work with March Boedihardjo, Roman Vershynin, and Girish Kumar.

Friday, November 3, 2023 - 12:00pm
LGRT 1685 and Zoom: