Parallel density-based clustering of complex objects

Authors:
Stefan Brecheisen;Hans-Peter Kriegel;Martin Pfeifle
Affiliations:
Institute for Informatics, University of Munich;Institute for Informatics, University of Munich;Institute for Informatics, University of Munich
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 6
Cited 5

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Effective Similarity Search on Voxelized CAD Objects

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Using sets of feature vectors for similarity search on voxelized CAD objects

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient Density-Based Clustering of Complex Objects

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining

Data weaving: scaling up the state-of-the-art in data clustering

Proceedings of the 17th ACM conference on Information and knowledge management
Density-based clustering using graphics processors

Proceedings of the 18th ACM conference on Information and knowledge management
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
GPU accelerated genetic clustering

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
Scalable parallel OPTICS data clustering using graph algorithmic techniques

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.02

Visualization

Abstract

In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.