Security-control methods for statistical databases: a comparative study
ACM Computing Surveys (CSUR)
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
GTM: the generative topographic mapping
Neural Computation
Learning in graphical models
Unsupervised Learning of Finite Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient greedy learning of Gaussian mixture models
Neural Computation
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Introduction to topic detection and tracking
Topic detection and tracking
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
A New Algorithm for Learning Parameters of a Bayesian Network from Distributed Data
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving k-means clustering over vertically partitioned data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Category cluster discovery from distributed WWW directories
Information Sciences—Informatics and Computer Science: An International Journal - special issue: Knowledge discovery from distributed information sources
k-TTP: a new privacy model for large-scale distributed environments
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data
IEEE Transactions on Knowledge and Data Engineering
Scalable density-based distributed clustering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Privacy and Ownership Preserving of Outsourced Medical Data
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
To do or not to do: the dilemma of disclosing anonymized data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Genetic-Based EM Algorithm for Learning Gaussian Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Visualizing Global Manifold Based on Distributed Local Data Abstractions
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Two methods for privacy preserving data mining with malicious participants
Information Sciences: an International Journal
Privacy preserving data mining of sequential patterns for network traffic data
Information Sciences: an International Journal
Information Sciences: an International Journal
Information Sciences: an International Journal
Performance evaluation of density-based clustering methods
Information Sciences: an International Journal
Learning global models based on distributed data abstractions
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Privacy-Preserving Tuple Matching in Distributed Databases
IEEE Transactions on Knowledge and Data Engineering
k-Anonymity in the Presence of External Databases
IEEE Transactions on Knowledge and Data Engineering
Privacy-Preserving Gradient-Descent Methods
IEEE Transactions on Knowledge and Data Engineering
Closeness: A New Privacy Measure for Data Publishing
IEEE Transactions on Knowledge and Data Engineering
Information Sciences: an International Journal
Hi-index | 0.07 |
Discovering global knowledge from distributed data sources is challenging, where the important issues include the ever-increasing data volume at the highly distributed sources and the general concern on data privacy. Properly abstracting the distributed data with a compact representation which can retain sufficient local details for global knowledge discovery in principle can address both the scalability and the data privacy challenges. This calls for the need to develop formal methodologies to support knowledge discovery on abstracted data. In this paper, we propose to abstract distributed data as Gaussian mixture models and learn a family of generative models from the abstracted data using a modified EM algorithm. To demonstrate the effectiveness of the proposed approach, we applied it to learn (a) data cluster models and (b) data manifold models, and evaluated their performance using both synthetic and benchmark data sets with promising results in terms of both effectiveness and scalability. Also, we have demonstrated that the proposed approach is robust against heterogeneous data distributions over the distributed sources.