Private similarity computation in distributed systems: from cryptography to differential privacy

  • Authors:
  • Mohammad Alaggan;Sébastien Gambs;Anne-Marie Kermarrec

  • Affiliations:
  • Université Rennes 1 --- IRISA, Rennes, France;Université de Rennes 1 --- INRIA/IRISA, Rennes, France;INRIA Rennes Bretagne-Atlantique, Rennes, France

  • Venue:
  • OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address the problem of computing the similarity between two users (according to their profiles) while preserving their privacy in a fully decentralized system and for the passive adversary model. First, we introduce a two-party protocol for privately computing a threshold version of the similarity and apply it to well-known similarity measures such as the scalar product and the cosine similarity. The output of this protocol is only one bit of information telling whether or not two users are similar beyond a predetermined threshold. Afterwards, we explore the computation of the exact and threshold similarity within the context of differential privacy. Differential privacy is a recent notion developed within the field of private data analysis guaranteeing that an adversary that observes the output of the differentially private mechanism, will only gain a negligible advantage (up to a privacy parameter) from the presence (or absence) of a particular item in the profile of a user. This provides a strong privacy guarantee that holds independently of the auxiliary knowledge that the adversary might have. More specifically, we design several differentially private variants of the exact and threshold protocols that rely on the addition of random noise tailored to the sensitivity of the considered similarity measure. We also analyze their complexity as well as their impact on the utility of the resulting similarity measure. Finally, we provide experimental results validating the effectiveness of the proposed approach on real datasets.