Multidimensional access methods
ACM Computing Surveys (CSUR)
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Evaluating a class of distance-mapping algorithms for data mining and clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
High performance clustering based on the similarity join
Proceedings of the ninth international conference on Information and knowledge management
ACM Computing Surveys (CSUR)
A polynomial time computable metric between point sets
Acta Informatica
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Effective Similarity Search on Voxelized CAD Objects
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Using sets of feature vectors for similarity search on voxelized CAD objects
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automatic extraction of clusters from hierarchical clustering representations
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Clustering multidimensional sequences in spatial and temporal databases
Knowledge and Information Systems
Varying Density Spatial Clustering Based on a Hierarchical Tree
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Visually driven analysis of movement data by progressive clustering
Information Visualization
Hi-index | 0.00 |
Data mining in large databases of complex objects from scientific, engineering or multimedia applications is getting more and more important. In many areas, complex distance measures are first choice but also simpler distance functions are available which can be computed much more efficiently. In this paper, we will demonstrate how the paradigm of multi-step query processing which relies on exact as well as on lower-bounding approximated distance functions can be integrated into the two density-based clustering algorithms DBSCAN and OPTICSE resulting in a considerable efficiency boost. Our approach tries to confine itself to ε-range queries on the simple distance functions and carries out complex distance computations only at that stage of the clustering algorithm where they are compulsory to compute the correct clustering result. Furthermore, we will show how our approach can be used for approximated clustering allowing the user to find an individual trade-off between quality and efficiency. In order to assess the quality of the resulting clusterings, we introduce suitable quality measures which can be used generally for evaluating the quality of approximated partitioning and hierarchical clusterings. In a broad experimental evaluation based on real-world test data sets, we demonstrate that our approach accelerates the generation of exact density-based clusterings by more than one order of magnitude. Furthermore, we show that our approximated clustering approach results in high quality clusterings where the desired quality is scalable with respect to (w.r.t.) the overall number of exact distance computations.