Distributed cooperative Bayesian Learning strategies
Information and Computation
A general probabilistic framework for clustering individuals and objects
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
The inference problem: a survey
ACM SIGKDD Explorations Newsletter
Cryptographic techniques for privacy-preserving data mining
ACM SIGKDD Explorations Newsletter
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
On the Privacy Preserving Properties of Random Data Perturbation Techniques
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Generative model-based clustering of directional data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving k-means clustering over vertically partitioned data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A unified framework for model-based clustering
The Journal of Machine Learning Research
Distributed clustering based on sampling local density estimates
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A distributed learning framework for heterogeneous data sources
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
iLink: search and routing in social networks
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Collaborative clustering with the use of Fuzzy C-Means and its quantification
Fuzzy Sets and Systems
Metastructural facets of granular computing
International Journal of Knowledge Engineering and Soft Data Paradigms
A multifaceted perspective at data analysis: a study in collaborative intelligent agents
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on cybernetics and cognitive informatics
Collaborative architectures of fuzzy modeling
WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Hi-index | 0.00 |
While data mining algorithms are often designed to operate on centralized data, in practice data is often acquired and stored in a distributed manner. Centralization of such data before analysis may not be desirable, and often not possible due to a variety of real-life constraints such as security, privacy and communication costs. This paper presents a general framework for distributed clustering that takes into account privacy requirements. It is based on building probabilistic models of the data at each local site, whose parameters are then transmitted to a central location. We mathematically show that the best representative of all the local models is a certain ''mean'' model, and empirically show that this model can be approximated quite well by generating artificial samples from the local models using sampling techniques, and then fitting a global model of a chosen parametric form to these samples. We also propose a new measure that quantifies privacy based on information theoretic concepts, and show that decreasing privacy improves the quality of the global model and vice versa. Empirical results are provided on different kinds of data to highlight the generality of our framework. The results show that high quality global clusters can be achieved with little loss of privacy.