Scalable density-based distributed clustering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
Effective and Efficient Distributed Model-Based Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
ACM Transactions on Computer Systems (TOCS)
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Gossiping in distributed systems
ACM SIGOPS Operating Systems Review - Gossip-based computer networking
Fully distributed EM for very large datasets
Proceedings of the 25th international conference on Machine learning
ACM Computing Surveys (CSUR)
Cooperative clustering model and its applications
Cooperative clustering model and its applications
A distributed EM algorithm to estimate the parameters of a finite mixture of components
Knowledge and Information Systems
Privacy and confidentiality in context-based and epidemic forwarding
Computer Communications
Scientific data repositories on the Web: An initial survey
Journal of the American Society for Information Science and Technology
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Clustering distributed data streams in peer-to-peer environments
Information Sciences: an International Journal
Distributed EM Algorithm for Gaussian Mixtures in Sensor Networks
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Distributed data collections are now more and more common due to the emergence of cloud computing, to spatially decentralized businesses, or to the availability of various data sharing web services. Obtain knowledge in such a collection raises then the need of new data mining methods to apply in a decentralized architecture. In this paper, we explore a machine learning side of this work direction. We propose a novel technique for decentralized estimation of probabilistic mixture models, which are among the most versatile generative models for understanding data sets. More precisely, we demonstrate how to estimate a global mixture model from a set of local models. Our approach accommodates dynamic topology and data sources and is statistically robust, i.e. resilient to the presence of unreliable local models. Such outlier models may arise from local data which are outliers, compared to the global trend, or poor mixture estimation. We report experiments on synthetic data and real geo-location data from Flickr.