A framework for measuring changes in data characteristics
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scalable algorithms for mining large databases
KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering
Journal of Global Optimization
Computer
Redefining Clustering for High-Dimensional Applications
IEEE Transactions on Knowledge and Data Engineering
Fully Dynamic Clustering of Metric Data Sets
BNCOD 19 Proceedings of the 19th British National Conference on Databases: Advances in Databases
A Visual Method of Cluster Validation with Fastmap
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
COFE: A Scalable Method for Feature Extraction from Complex Objects
DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
A Human-Computer Interactive Method for Projected Clustering
IEEE Transactions on Knowledge and Data Engineering
Hypergraph Models and Algorithms for Data-Pattern-Based Clustering
Data Mining and Knowledge Discovery
A top-down approach for density-based clustering using multidimensional indexes
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Clustering in Dynamic Spatial Databases
Journal of Intelligent Information Systems
Antipole Tree Indexing to Support Range Search and K-Nearest Neighbor Search in Metric Spaces
IEEE Transactions on Knowledge and Data Engineering
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing
Data Mining and Knowledge Discovery
QROCK: A quick version of the ROCK algorithm for clustering of categorical data
Pattern Recognition Letters
Approximate data mining in very large relational data
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Semantic peer, here are the neighbors you want!
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Research on Spatial Clustering Acetabuliform Model and Algorithm Based on Mathematical Morphology
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Image-mapped data clustering: An efficient technique for clustering large data sets
Intelligent Data Analysis
A scalable framework for cluster ensembles
Pattern Recognition
Distributed clustering based on sampling local density estimates
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Extending fuzzy and probabilistic clustering to very large data sets
Computational Statistics & Data Analysis
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
Agent-based distributed data mining: the KDEC scheme
Intelligent information agents
Information theoretic criteria for community detection
SNAKDD'08 Proceedings of the Second international conference on Advances in social network mining and analysis
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Measure based metrics for aggregated data
Intelligent Data Analysis
Distributed spatial clustering in sensor networks
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
MIS'05 Proceedings of the 11th international conference on Advances in Multimedia Information Systems
On approximation algorithms for data mining applications
Efficient Approximation and Online Algorithms
Improved tangent space based distance metric for accurate lithographic hotspot classification
Proceedings of the 49th Annual Design Automation Conference
Knowledge augmentation via incremental clustering: new technology for effective knowledge management
International Journal of Business Information Systems
Hi-index | 0.00 |
Clustering partitions a collection of objects into groups called clusters, such that similar objects fall into the same group. Similarity between objects is defined by a distance function satisfying the triangle inequality; this distance function along with the collection of objects describes a distance space. In a distance space, the only operation possible on data objects is the computation of distance between them. All scalable algorithms in the literature assume a special type of distance space, namely a k-dimensional vector space, which allows vector operations on objects. We present two scalable algorithms designed for clustering very large datasets in distance spaces. Our first algorithm BUBBLE is, to our knowledge, the first scalable clustering algorithm for data in a distance space. Our second algorithm BUBBLE-FM improves upon BUBBLE by reducing the number of calls to the distance function, which may be computationally very expensive. Both algorithms make only a single scan over the database while producing high clustering quality. In a detailed experimental evaluation, we study both algorithms in terms of scalability and quality of clustering. We also show results of applying the algorithms to a real-life dataset.