Efficient privacy preserving k-means clustering

  • Authors:
  • Maneesh Upmanyu;Anoop M. Namboodiri;Kannan Srinathan;C. V. Jawahar

  • Affiliations:
  • International Institute of Information Technology, Hyderabad;International Institute of Information Technology, Hyderabad;International Institute of Information Technology, Hyderabad;International Institute of Information Technology, Hyderabad

  • Venue:
  • PAISI'10 Proceedings of the 2010 Pacific Asia conference on Intelligence and Security Informatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces an efficient privacy-preserving protocol for distributed K-means clustering over an arbitrary partitioned data, shared among N parties. Clustering is one of the fundamental algorithms used in the field of data mining. Advances in data acquisition methodologies have resulted in collection and storage of vast quantities of user’s personal data. For mutual benefit, organizations tend to share their data for analytical purposes, thus raising privacy concerns for the users. Over the years, numerous attempts have been made to introduce privacy and security at the expense of massive additional communication costs. The approaches suggested in the literature make use of the cryptographic protocols such as Secure Multiparty Computation (SMC) and/or homomorphic encryption schemes like Paillier’s encryption. Methods using such schemes have proven communication overheads. And in practice are found to be slower by a factor of more than 106. In light of the practical limitations posed by privacy using the traditional approaches, we explore a paradigm shift to side-step the expensive protocols of SMC. In this work, we use the paradigm of secret sharing, which allows the data to be divided into multiple shares and processed separately at different servers. Using the paradigm of secret sharing, allows us to design a provably-secure, cloud computing based solution which has negligible communication overhead compared to SMC and is hence over a million times faster than similar SMC based protocols.