Scalable density-based distributed clustering

Authors:
Eshref Januzaj;Hans-Peter Kriegel;Martin Pfeifle
Affiliations:
Braunschweig University of Technology, Software Systems Engineering;University of Munich, Institute for Computer Science;University of Munich, Institute for Computer Science
Venue:
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2004

Citing 0
Cited 23

Density-based clustering of uncertain data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Efficient and effective server-sided distributed clustering

Proceedings of the 14th ACM international conference on Information and knowledge management
Effective and Efficient Distributed Model-Based Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
PENS: an algorithm for density-based clustering in peer-to-peer systems

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
ST-DBSCAN: An algorithm for clustering spatial-temporal data

Data & Knowledge Engineering
An effective algorithm for mining 3-clusters in vertically partitioned data

Proceedings of the 17th ACM conference on Information and knowledge management
A new approach for distributed density based clustering on grid platform

BNCOD'07 Proceedings of the 24th British national conference on Databases
Lightweight clustering technique for distributed data mining applications

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Ensemble learning based distributed clustering

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Scalable clustering algorithm for N-body simulations in a shared-nothing cluster

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Distributed antipole clustering for efficient data search and management in Euclidean and metric spaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scalable local density-based distributed clustering

Expert Systems with Applications: An International Journal
Learning latent variable models from distributed and abstracted data

Information Sciences: an International Journal
Probabilistic similarity join on uncertain data

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Approximated clustering of distributed high-dimensional data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Clustering distributed data streams in peer-to-peer environments

Information Sciences: an International Journal
A Sequential Sampling Framework for Spectral k-Means Based on Efficient Bootstrap Accuracy Estimations: Application to Distributed Clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
A framework for Multi-Agent Based Clustering

Autonomous Agents and Multi-Agent Systems
Distributed data mining patterns and services: an architecture and experiments

Concurrency and Computation: Practice & Experience
MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data

Frontiers of Computer Science: Selected Publications from Chinese Universities
GoSCAN: Decentralized scalable data clustering

Computing
Robust estimation of a global Gaussian mixture by decentralized aggregations of local models

Web Intelligence and Agent Systems
Fuzzy and crisp clustering methods based on the neighborhood concept: A comprehensive review

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - FUZZYSS'2011: 2nd International Fuzzy Systems Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering has become an increasingly important task in analysing huge amounts of data. Traditional applications require that all data has to be located at the site where it is scrutinized. Nowadays, large amounts of heterogeneous, complex data reside on different, independently working computers which are connected to each other via local or wide area networks. In this paper, we propose a scalable density-based distributed clustering algorithm which allows a user-defined trade-off between clustering quality and the number of transmitted objects from the different local sites to a global server site. Our approach consists of the following steps: First, we order all objects located at a local site according to a quality criterion reflecting their suitability to serve as local representatives. Then we send the best of these representatives to a server site where they are clustered with a slightly enhanced density-based clustering algorithm. This approach is very efficient, because the local detemination of suitable representatives can be carried out quickly and independently from each other. Furthermore, based on the scalable number of the most suitable local representatives, the global clustering can be done very effectively and efficiently. In our experimental evaluation, we will show that our new scalable density-based distributed clustering approach results in high quality clusterings with scalable transmission cost.