A Fast Parallel Clustering Algorithm for Large Spatial Databases

Authors:
Xiaowei Xu;Jochen Jäger;Hans-Peter Kriegel
Affiliations:
Corporate Technology, Siemens AG, Otto-Hahn-Ring 6, D-81730 München, Germany. Xiaowei.Xu@mchp.siemens.de;Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 München, Germany. jaeger@informatik.uni-muenchen.de;Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 München, Germany. kriegel@informatik.uni-muenchen.de
Venue:
Data Mining and Knowledge Discovery
Year:
1999

Citing 18
Cited 29

Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor

Journal of Documentation
Fractals for secondary key retrieval

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
An introduction to parallel algorithms

An introduction to parallel algorithms
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Parallel algorithms for hierarchical clustering

Parallel Computing
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Halo World: Tools for Parallel Cluster Finding inAstrophysical N-body Simulations

Data Mining and Knowledge Discovery
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
An introduction to spatial database systems

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Systems for Knowledge Discovery in Databases

IEEE Transactions on Knowledge and Data Engineering
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Data placement in shared-nothing parallel database systems

The VLDB Journal — The International Journal on Very Large Data Bases
Remote Sensing Digital Image Analysis: An Introduction

Remote Sensing Digital Image Analysis: An Introduction

High performance data mining (tutorial PM-3)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On distributing the clustering process

Pattern Recognition Letters
Parallel and distributed data mining through parallel skeletons and distributed objects

Data mining
Effective and Efficient Distributed Model-Based Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Mathematical Morphology Based Scale Space Method for the Mining of Linear Features in Geographic Data

Data Mining and Knowledge Discovery
PENS: an algorithm for density-based clustering in peer-to-peer systems

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Merging distributed database summaries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Message Passing Clustering (MPC): a knowledge-based framework for clustering under biological constraints

International Journal of Data Mining and Bioinformatics
Data weaving: scaling up the state-of-the-art in data clustering

Proceedings of the 17th ACM conference on Information and knowledge management
Density-based clustering using graphics processors

Proceedings of the 18th ACM conference on Information and knowledge management
An efficient clustering algorithm for large-scale topical web pages

Proceedings of the 18th ACM conference on Information and knowledge management
Parallel K-Means Clustering Based on MapReduce

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
A new approach for distributed density based clustering on grid platform

BNCOD'07 Proceedings of the 24th British national conference on Databases
Lightweight clustering technique for distributed data mining applications

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Towards personal high-performance geospatial computing (HPC-G): perspectives and a case study

Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Scalable clustering algorithm for N-body simulations in a shared-nothing cluster

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Distributed antipole clustering for efficient data search and management in Euclidean and metric spaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
BotTrack: tracking botnets using NetFlow and PageRank

NETWORKING'11 Proceedings of the 10th international IFIP TC 6 conference on Networking - Volume Part I
DisClus: a distributed clustering technique over high resolution satellite data

ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
A parallel method for computing rough set approximations

Information Sciences: an International Journal
Parallel rough set based knowledge acquisition using MapReduce from big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Parallel decision tree with application to water quality data analysis

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
p-PIC: Parallel power iteration clustering for big data

Journal of Parallel and Distributed Computing
Scalable parallel OPTICS data clustering using graph algorithmic techniques

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data

Frontiers of Computer Science: Selected Publications from Chinese Universities
A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

International Journal of Approximate Reasoning
DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The clustering algorithm DBSCAN relies on a density-basednotion of clusters and is designed to discover clusters ofarbitrary shape as well as to distinguish noise. In this paper, wepresent PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnectedthrough a network. A fundamental component of a shared-nothing systemis its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread amongmultiple computers and the indexes of the data are replicated onevery computer. We implemented our method using a number ofworkstations connected via Ethernet (10 Mbit). A performanceevaluation shows that PDBSCAN offers nearly linear speedup and hasexcellent scaleup and sizeup behavior.