[News- Privacy] Differential Privacy

Uno dei problemi principali dei processi di anonimizzazione consiste nel fatto che, in alcuni casi, associando il dato anonimo ad altre informazioni è comunque possibile risalire al dato completo. La Differential Privacy risolverebbe entrambe le problematiche rendendo conseguentemente più sicuro il trattamento.

In un paper pubblicato dalla rivista Foundations and Trends in Theoretical Computer Science, si fornisce una spiegazione efficace, anche se naturalmente semplificata, del funzionamento di tale algoritmo.

È molto significativa questa scelta, specialmente se letta in combinato con la normativa europea sulla data protection la quale, come noto, guarda alla minimizzazione come ad un principio fondamentale imprescindibile.

In tal senso, è condivisibile la volontà di anonimizzare il dato anziché limitarsi ad una pseudonimizzazione in quanto, come noto, mentre i dati pseudonimizzati debbono considerarsi comunque dei dati personali (vedi considerando n.26 del GDPR), i dati anonimizzati possono invece considerarsi dati non personali in quanto non riconducibili (nemmeno attraverso operazioni di reverse engineering) ad un soggetto determinato.

Google la quale, di recente, ha pubblicato la propria library di algoritmi di differential privacy rendendo gli stessi accessibili a tutti mediante rilascio in open source. La messa a disposizione di tali algoritmi non è stata fatta per tramite di un’app o di un tool di facile ed intuitivo utilizzo (come Google ci ha sempre abituati). L’algoritmo è pubblicato in modo grezzo, risultando utilizzabile solo da personale esperto.

ABSTRACT

“The Algorithmic Foundations of Differential Privacy” of Foundations and Trend in Theoretical Computer Science

The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition.

After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental tech- niques for achieving differential privacy, and application of these tech- niques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation. Despite some astonishingly pow- erful computational results, there are still fundamental limitations — not just on what can be achieved with differential privacy but on what can be achieved with any method that protects against a complete breakdown in privacy. Virtually all the algorithms discussed herein maintain differential privacy against adversaries of arbitrary compu- tational power. Certain algorithms are computationally intensive, oth- ers are efficient. Computational complexity for the adversary and the algorithm are both discussed.

We then turn from fundamentals to applications other than query- release, discussing differentially private methods for mechanism design and machine learning. The vast majority of the literature on differen- tially private algorithms considers a single, static, database that is sub- ject to many analyses. Differential privacy in other models, including distributed databases and computations on data streams is discussed.

Finally, we note that this work is meant as a thorough introduc- tion to the problems and techniques of differential privacy, but is not intended to be an exhaustive survey — there is by now a vast amount of work in differential privacy, and we can cover only a small portion of it.