Differentially private top-k query over MapReduce

  • Authors:
  • Xu Han;Miao Wang;Xiaojian Zhang;Xiaofeng Meng

  • Affiliations:
  • Renmin University of China, Beijing, China;Renmin University of China, Beijing, China;Renmin University of China, Beijing, China;Renmin University of China, Beijing, China

  • Venue:
  • Proceedings of the fourth international workshop on Cloud data management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovering that Map-Reduce framework is a popular way to deal with a large scale of data, but there is a significant risk to leak out users' personal information, especially when the data is sensitive, for example, including users' health records, salary information, etc. Differential privacy has recently emerged as a new paradigm for preserving private data. This makes it possible to provide strong theoretical guarantees on the privacy and utility of the query results. In this paper, we focus on top-k query which is one of the most useful queries in Map-Reduce framework over big data sets. Motivated by this, we propose an efficient algorithm, called DiffMR Differentially private Top-kquery over MapReduce), for processing top-k query as well as satisfying differential privacy. In our algorithm, to avoid the private leak in middle process, we use exponential mechanism to select top-k records from big data sets by using score function. When the data set is too large to get a reasonably accurate result, we can reduce the reject rate and execute several more times Map-Reduce to get a more accurate top-k query result. After getting a final top-k candidate result, we will add Laplace noise to each record and adopt post-processing technique to improve the accuracy of query answers. Our experimental study demonstrates that DiffMR algorithm can be used to answer the top-k query accurately in Map-Reduce framework.