Epidemic algorithms for replicated database maintenance
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
The SEQUOIA 2000 storage benchmark
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Data Structures for Range Searching
ACM Computing Surveys (CSUR)
Distributed data clustering can be efficient and exact
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Scalable density-based distributed clustering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A privacy-sensitive approach to distributed clustering
Pattern Recognition Letters - Special issue: Advances in pattern recognition
Effective and Efficient Distributed Model-Based Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
PENS: an algorithm for density-based clustering in peer-to-peer systems
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Distributed Data Mining in Peer-to-Peer Networks
IEEE Internet Computing
ACM Transactions on Computer Systems (TOCS)
Distributed classification in peer-to-peer networks
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proactive gossip-based management of semantic overlay networks: Research Articles
Concurrency and Computation: Practice & Experience - Parallel and Distributed Computing (EuroPar 2005)
A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems
IEEE Transactions on Knowledge and Data Engineering
Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization
IEEE Transactions on Knowledge and Data Engineering
Approximate Distributed K-Means Clustering over a Peer-to-Peer Network
IEEE Transactions on Knowledge and Data Engineering
Lightweight clustering technique for distributed data mining applications
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Distributed data clustering in multi-dimensional peer-to-peer networks
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Scalable local density-based distributed clustering
Expert Systems with Applications: An International Journal
Approximated clustering of distributed high-dimensional data
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Association rule mining in peer-to-peer systems
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
Identifying clusters is an important aspect of analyzing large datasets. Clustering algorithms classically require access to the complete dataset. However, as huge amounts of data are increasingly originating from multiple, dispersed sources in distributed systems, alternative solutions are required. Furthermore, data and network dynamicity in a distributed setting demand adaptable clustering solutions that offer accurate clustering models at a reasonable pace. In this paper, we propose GoScan, a fully decentralized density-based clustering algorithm which is capable of clustering dynamic and distributed datasets without requiring central control or message flooding. We identify two major tasks: finding the core data points, and forming the actual clusters, which we execute in parallel employing gossip-based communication. This approach is very efficient, as it offers each peer enough authority to discover the clusters it is interested in. Our algorithm poses no extra burden of overlay formation in the network, while providing high levels of scalability. We also offer several optimizations to the basic clustering algorithm for improving communication overhead and processing costs. Coping with dynamic data is made possible by introducing an age factor, which gradually detects data-set changes and enables clustering updates. In our experimental evaluation, we will show that GoSCAN can discover the clusters efficiently with scalable transmission cost.