Dr. Anand Sarwate (Rutgers University, USA) – An information theorist visits differential privacy.

6th June, 2025 | News

On 20 May 2025, Dr Anand Sarwate (Rutgers University, USA) presented a seminar entitled An information theorist visits differential privacy. The presentation covered several connections between differential privacy, one of the gold-standard frameworks for understanding privacy risk in AI systems, and information theory, which studies the fundamental limits of communication and data representation. The talk was introductory in nature and aimed at advanced undergraduate or postgraduate students. Dr. Sarwate visited the INFORMED AI Hub at the University of Bristol for a few days to discuss several different research topics both related and unrelated to data privacy.

A brief precis of the presentation follows

Since being proposed in 2006, differential privacy has become a standard method for quantifying certain risks in publishing or sharing analyses of sensitive data. A crucial way in which differential privacy departs from prior privacy frameworks is that privacy risk is a property of the process handling sensitive data and not a property of the data itself. Under differential privacy, data cannot be “anonymized” or “de-identified”: instead, a differentially private algorithm or “mechanism” can create a synthetic data set that tries to mimic the original data. Differentially private methods use randomness to create additional uncertainty for certain types of inferences that an adversary may to try make about the original.

The word “uncertainty” naturally raises the question of how ideas from information theory intersect with differential privacy. The uncertainty created by a differentially private algorithm is akin to the uncertainty created in a communication channel by unknown noise or interference. In information theoretic models of communication, a transmitter tries to send a message over a noisy channel. In a differentially private model of communication, a data curator engineers a noisy channel to provide privacy guarantees. A privacy engineer needs to design a channel that balanced providing useful information to legitimate parties while making malicious inferences difficult. For example, we might want to train an AI model for disease prediction which is accurate but does not allow re-inference about individuals whose data was used during training. This is the privacy-utility tradeoff.

The talk discussed a few touchpoint between information theory and differential privacy. The first is the connection to hypothesis testing: the differential privacy guarantee is about making certain decision problems challenging for an adversary. The second is in channel design and optimizing the mechanism/channel under utility constraints. The third touchpoint is through the theory of generalized divergences, and in particular the f-divergence known as the “hockey-stick divergence.” This opens up a rich connection to the data processing inequalities that lie at the heart of privacy guarantees.

Overall, there are many interesting problems in privacy ranging from the basic mathematics to computational statistics to applied engineering. All of these can be enriched by considering these fundamental connections between differential privacy and information theory.