Learning latent variable models from distributed and abstracted data

Authors:
Xiaofeng Zhang;William K. Cheung;C. H. Li
Affiliations:
Harbin Institute of Technology, School of Computer Sciecne and Technology, Shenzhen Graduate School, Kowloon Tong, Hong Kong;Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong;Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 32
Cited 1

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
GTM: the generative topographic mapping

Neural Computation
Learning in graphical models

Learning in graphical models
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient greedy learning of Gaussian mixture models

Neural Computation
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Introduction to topic detection and tracking

Topic detection and tracking
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
A New Algorithm for Learning Parameters of a Bayesian Network from Distributed Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Category cluster discovery from distributed WWW directories

Information Sciences—Informatics and Computer Science: An International Journal - special issue: Knowledge discovery from distributed information sources
k-TTP: a new privacy model for large-scale distributed environments

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
Scalable density-based distributed clustering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Privacy and Ownership Preserving of Outsourced Medical Data

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
To do or not to do: the dilemma of disclosing anonymized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Genetic-Based EM Algorithm for Learning Gaussian Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Visualizing Global Manifold Based on Distributed Local Data Abstractions

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Two methods for privacy preserving data mining with malicious participants

Information Sciences: an International Journal
Privacy preserving data mining of sequential patterns for network traffic data

Information Sciences: an International Journal
A novel user identification scheme with key distribution preserving user anonymity for distributed computer networks

Information Sciences: an International Journal
k-Anonymous data collection

Information Sciences: an International Journal
Performance evaluation of density-based clustering methods

Information Sciences: an International Journal
Learning global models based on distributed data abstractions

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Privacy-Preserving Tuple Matching in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
k-Anonymity in the Presence of External Databases

IEEE Transactions on Knowledge and Data Engineering
Privacy-Preserving Gradient-Descent Methods

IEEE Transactions on Knowledge and Data Engineering
Closeness: A New Privacy Measure for Data Publishing

IEEE Transactions on Knowledge and Data Engineering

The heterogeneous multi-factory production network scheduling with adaptive communication policy and parallel machine

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Discovering global knowledge from distributed data sources is challenging, where the important issues include the ever-increasing data volume at the highly distributed sources and the general concern on data privacy. Properly abstracting the distributed data with a compact representation which can retain sufficient local details for global knowledge discovery in principle can address both the scalability and the data privacy challenges. This calls for the need to develop formal methodologies to support knowledge discovery on abstracted data. In this paper, we propose to abstract distributed data as Gaussian mixture models and learn a family of generative models from the abstracted data using a modified EM algorithm. To demonstrate the effectiveness of the proposed approach, we applied it to learn (a) data cluster models and (b) data manifold models, and evaluated their performance using both synthetic and benchmark data sets with promising results in terms of both effectiveness and scalability. Also, we have demonstrated that the proposed approach is robust against heterogeneous data distributions over the distributed sources.