Anonymity meets game theory: secure data integration with malicious participants

  • Authors:
  • Noman Mohammed;Benjamin C. Fung;Mourad Debbabi

  • Affiliations:
  • Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada H3G 1M8

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data integration methods enable different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In VLDBJ 2006, Jiang and Clifton (Very Large Data Bases J (VLDBJ) 15(4):316---333, 2006) propose a secure Distributed k-Anonymity (DkA) framework for integrating two private data tables to a k-anonymous table in which each private table is a vertical partition on the same set of records. Their proposed DkA framework is not scalable to large data sets. Moreover, DkA is limited to a two-party scenario and the parties are assumed to be semi-honest. In this paper, we propose two algorithms to securely integrate private data from multiple parties (data providers). Our first algorithm achieves the k-anonymity privacy model in a semi-honest adversary model. Our second algorithm employs a game-theoretic approach to thwart malicious participants and to ensure fair and honest participation of multiple data providers in the data integration process. Moreover, we study and resolve a real-life privacy problem in data sharing for the financial industry in Sweden. Experiments on the real-life data demonstrate that our proposed algorithms can effectively retain the essential information in anonymous data for data analysis and are scalable for anonymizing large data sets.