16 September 2020

From anonymous to private

Result

Safeguarding sensitive personal information is an important part of regulations on data protection. But how do we combine that with making it possible to take advantage of combined, aggregate data? New research shows that a simple mechanism, a digital version of a ballot box, is a surprisingly powerful tool to meet these conflicting goals.

Most people are reluctant to give information about their personal lives (e.g. location, browsing, or health information) to untrusted, third parties. Some are uneasy about such information being held by government entities. On the other hand, aggragate information about people can be used for things like better traffic routing and planning, medical research, and better policy making. Some types of information are collected by public, statistical agencies and presented in aggregate form. This is expensive and slow, so the scope of such collection is limited. But what if you could share your information without ever giving it away in the first place?

Voting by putting a paper ballot into a box is a process carefully designed with anonymity in mind. To someone who empties a ballot box there is, a priori, no way to connect ballots to individual voters. Cryptographers have designed a digital equivalent of a ballot box, using so-called mixnets. Could we simply use such digital ballot boxes to collect and aggregate anonymous information? This question has been studied intensively by privacy researchers in recent years.

However, simply making your information anonymous may not suffice. A hypothetical example would be that everyone who voted 'yes' reveals their vote, which would allow the choice of all 'no' voters to be singled out. In practice, it has been shown that combining sufficiently many aggregated facts (such as the 2010 US census and other freely available data), can allow personal attributes of a large percentage of individuals to be determined.

The area of differential privacy gives a set of techniques that hide individual contributions to aggregates. In the ballot box example, for a yes/no vote this can be achieved by putting a small number of ballots into the ballot box with random yes/no votes. This is unlikely to change the majority, but masks individual contributions. Recently, a group of researchers from Google and MIT, with BARC core researcher Rasmus Pagh, discovered that powerful new differentially private mechanisms are possible by allowing multiple ballots from each participant in the data collection process.

To give an idea of how this might work, consider writing 'Yes' as 1 and 'No' as 0. Instead of putting a number x on your ballot, you can put several numbers (negative and positive) that sum to x. To find the number of 'Yes' votes, simply sum up the numbers on all ballots. (We are assuming, for simplicity, that everyone is honest and follows the protocol.) It can be shown mathematically that a small number of ballots is sufficient to make the set of ballots look essentially like a collection of random numbers.

Figure 1: Aggregating distributed data via the digital equivalent of a ballot box yields accuracy-privacy trade-offs much better than previous methods.