Epidemic algorithms for replicated database maintenance
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
SS'08 Proceedings of the 17th conference on Security symposium
Clustering distributed data streams in peer-to-peer environments
Information Sciences: an International Journal
Least squares quantization in PCM
IEEE Transactions on Information Theory
Hi-index | 0.00 |
This paper proposes DS-means, a novel algorithm for clustering distributed data streams. Given a network of computing nodes, each of them receiving its share of a distributed data stream, our goal is to obtain a common clustering under the following restrictions (i) the number of clusters is not known in advance and (ii) nodes are not allowed to share single points of their datasets, but only aggregate information. A motivating example for DS-means is the decentralized detection of botnets, where a collection of independent ISPs may want to detect common threats, but are unwilling to share their precious users' data. In DS-means, nodes execute a distributed version of K-means on each chunk of data they receive to provide a compact representation of the data of the entire network. Later, X-means is executed on this representation to obtain an estimate of the number of clusters. A number of experiments on both synthetic and real-life datasets show that our algorithm is precise, efficient and robust.