Multi-step density-based clustering

Authors:
Stefan Brecheisen;Hans-Peter Kriegel;Martin Pfeifle
Affiliations:
Institute for Informatics, University of Munich, Munich, Germany;Institute for Informatics, University of Munich, Munich, Germany;Institute for Informatics, University of Munich, Munich, Germany
Venue:
Knowledge and Information Systems
Year:
2006

Citing 16
Cited 3

Multidimensional access methods

ACM Computing Surveys (CSUR)
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Evaluating a class of distance-mapping algorithms for data mining and clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
High performance clustering based on the similarity join

Proceedings of the ninth international conference on Information and knowledge management
Searching in metric spaces

ACM Computing Surveys (CSUR)
A polynomial time computable metric between point sets

Acta Informatica
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Effective Similarity Search on Voxelized CAD Objects

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Using sets of feature vectors for similarity search on voxelized CAD objects

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automatic extraction of clusters from hierarchical clustering representations

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Clustering multidimensional sequences in spatial and temporal databases

Knowledge and Information Systems
Varying Density Spatial Clustering Based on a Hierarchical Tree

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Visually driven analysis of movement data by progressive clustering

Information Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining in large databases of complex objects from scientific, engineering or multimedia applications is getting more and more important. In many areas, complex distance measures are first choice but also simpler distance functions are available which can be computed much more efficiently. In this paper, we will demonstrate how the paradigm of multi-step query processing which relies on exact as well as on lower-bounding approximated distance functions can be integrated into the two density-based clustering algorithms DBSCAN and OPTICSE resulting in a considerable efficiency boost. Our approach tries to confine itself to ε-range queries on the simple distance functions and carries out complex distance computations only at that stage of the clustering algorithm where they are compulsory to compute the correct clustering result. Furthermore, we will show how our approach can be used for approximated clustering allowing the user to find an individual trade-off between quality and efficiency. In order to assess the quality of the resulting clusterings, we introduce suitable quality measures which can be used generally for evaluating the quality of approximated partitioning and hierarchical clusterings. In a broad experimental evaluation based on real-world test data sets, we demonstrate that our approach accelerates the generation of exact density-based clusterings by more than one order of magnitude. Furthermore, we show that our approximated clustering approach results in high quality clusterings where the desired quality is scalable with respect to (w.r.t.) the overall number of exact distance computations.