PENS: an algorithm for density-based clustering in peer-to-peer systems

Authors:
Mei Li;Guanling Lee;Wang-Chien Lee;Anand Sivasubramaniam
Affiliations:
Pennsylvania State University, University Park, Pennsylvania;National Dong Hwa University, Hualien, Taiwan, R.O.C;Pennsylvania State University, University Park, Pennsylvania;Pennsylvania State University, University Park, Pennsylvania
Venue:
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Year:
2006

Citing 17
Cited 7

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Distributed data clustering can be efficient and exact

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
A Fast Parallel Clustering Algorithm for Large Spatial Databases

Data Mining and Knowledge Discovery
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Semantic Small World: An Overlay Network for Peer-to-Peer Search

ICNP '04 Proceedings of the 12th IEEE International Conference on Network Protocols
Scalable density-based distributed clustering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Guaranteeing correctness and availability in P2P range indices

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A case study in building layered DHT applications

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Clustering distributed data streams in peer-to-peer environments

Information Sciences: an International Journal

Robust clustering analysis for the management of self-monitoring distributed systems

Cluster Computing
Preserving locality in MMVE applications based on ant clustering

VECIMS'09 Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems
A new approach for distributed density based clustering on grid platform

BNCOD'07 Proceedings of the 24th British national conference on Databases
Distributed data clustering in multi-dimensional peer-to-peer networks

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Design and evaluation of decentralized online clustering

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
ASCCN: Arbitrary Shaped Clustering Method with Compatible Nucleoids

International Journal of Data Warehousing and Mining
GoSCAN: Decentralized scalable data clustering

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Huge amounts of data are available in large-scale networks of autonomous data sources dispersed over a wide area. Data mining is an essential technology for obtaining hidden and valuable knowledge from these networked data sources. In this paper, we investigate clustering, one of the most important data mining tasks, in one of such networked computing environments, i.e., peer-to-peer (P2P) systems. The lack of a central control and the sheer large size of P2P systems make the existing clustering techniques not applicable here. We propose a fully distributed clustering algorithm, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly. The main idea of PENS is hierarchical cluster assembly, which enables peers to collaborate in forming a global clustering model without requiring a central control or message flooding. The complexity analysis of the algorithm demonstrates that PENS can discover clusters and noise efficiently in P2P systems.