Communication-Efficient Privacy-Preserving Clustering

Authors:
Geetha Jagannathan;Krishnan Pillaipakkamnatt;Rebecca N. Wright;Daryl Umano
Affiliations:
Department of Computer Science, Rutgers University, New Brunswick, NJ, USA. e-mail: geetha@cs.rutgers.edu;Department of Computer Science, Hofstra University, Hempstead, NY,USA. e-mail: csckzp@hofstra.edu;Department of Computer Science, Rutgers University, New Brunswick, NJ, USA. e-mail: Rebecca.Wright@rutgers.edu;Department of Computer Science, Hofstra University, Hempstead, NY,USA. e-mail: dumano33@hotmail.com
Venue:
Transactions on Data Privacy
Year:
2010

Citing 33
Cited 1

The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Data Warehousing and Data Mining for Telecommunications

Data Warehousing and Data Mining for Telecommunications
Clustering Algorithms

Clustering Algorithms
Building the Data Warehouse,3rd Edition

Building the Data Warehouse,3rd Edition
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Building decision tree classifier on private data

CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Privacy-Preserving Cooperative Statistical Analysis

ACSAC '01 Proceedings of the 17th Annual Computer Security Applications Conference
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Foundations of Cryptography: Volume 2, Basic Applications

Foundations of Cryptography: Volume 2, Basic Applications
Privacy-preserving distributed k-means clustering over arbitrarily partitioned data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Privacy-Preserving Computation of Bayesian Networks on Vertically Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
Secure two-party k-means clustering

Proceedings of the 14th ACM conference on Computer and communications security
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
StreamMiner: a classifier ensemble-based engine to mine concept-drifting data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A Secure Clustering Algorithm for Distributed Data Streams

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
A survey of homomorphic encryption for nonspecialists

EURASIP Journal on Information Security
How to generate and exchange secrets

SFCS '86 Proceedings of the 27th Annual Symposium on Foundations of Computer Science
Fully homomorphic encryption using ideal lattices

Proceedings of the forty-first annual ACM symposium on Theory of computing
Public-key cryptosystems based on composite degree residuosity classes

EUROCRYPT'99 Proceedings of the 17th international conference on Theory and application of cryptographic techniques
Evaluating 2-DNF formulas on ciphertexts

TCC'05 Proceedings of the Second international conference on Theory of Cryptography
Privacy preserving clustering

ESORICS'05 Proceedings of the 10th European conference on Research in Computer Security
On private scalar product computation for privacy-preserving data mining

ICISC'04 Proceedings of the 7th international conference on Information Security and Cryptology

Bands of privacy preserving objectives: classification of PPDM strategies

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to store vast quantities of data and the emergence of high speed networking have led to intense interest in distributed data mining. However, privacy concerns, as well as regulations, often prevent the sharing of data between multiple parties. Privacy-preserving distributed data mining allows the cooperative computation of data mining algorithms without requiring the participating organizations to reveal their individual data items to each other. This paper makes several contributions. First, we present a simple, deterministic, I/O-efficient kclustering algorithm that was designed with the goal of enabling an efficient privacy-preserving version of the algorithm. Our algorithm examines each item in the database only once and uses only sequential access to the data. Our experiments show that this algorithm produces cluster centers that are, on average, more accurate than the ones produced by the well known iterative k-means algorithm, and compares well against BIRCH. Second, we present a distributed privacy-preserving protocol for k-clustering based on our new clustering algorithm. The protocol applies to databases that are horizontally partitioned between two parties. The participants of the protocol learn only the final cluster centers on completion of the protocol. Unlike most of the earlier results in privacy-preserving clustering, our protocol does not reveal intermediate candidate cluster centers. The protocol is also efficient in terms of communication and does not depend on the size of the database. Although there have been other clustering algorithms that improve on the k-means algorithm, ours is the first for which a communication efficient cryptographic privacy-preserving protocol has been demonstrated.