How many clusters are best?—an experiment
Pattern Recognition
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Clustering techniques for large data sets—from the past to the future
KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Data mining: concepts and techniques
Data mining: concepts and techniques
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Introduction to the Theory of Neural Computation
Introduction to the Theory of Neural Computation
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Clustering Algorithm for Market Basket Data Based on Small Large Ratios
COMPSAC '01 Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development
Hierarchical model-based clustering of large datasets through fractionation and refractionation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A robust and efficient clustering algorithm based on cohesion self-merging
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Using hierarchical clustering for learning theontologies used in recommendation systems
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points
Pattern Recognition Letters
Image-mapped data clustering: An efficient technique for clustering large data sets
Intelligent Data Analysis
A multi-prototype clustering algorithm
Pattern Recognition
Nonlinear Data Analysis Using a New Hybrid Data Clustering Algorithm
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Enhanced bisecting k-means clustering using intermediate cooperation
Pattern Recognition
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets
Pattern Recognition Letters
A probabilistic relational approach for web document clustering
Information Processing and Management: an International Journal
Pattern Recognition
Clustering of Adolescent Criminal Offenders using Psychological and Criminological Profiles
Proceedings of the 2010 conference on Data Mining for Business Applications
Minimum spanning tree based split-and-merge: A hierarchical clustering method
Information Sciences: an International Journal
Hierarchical K-means clustering algorithm based on silhouette and entropy
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part I
Hybrid agglomerative clustering for large databases: an efficient interactivity approach
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Multi-scale decomposition of point process data
Geoinformatica
Identifying hidden geospatial resources in catalogues
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Comparing relational and non-relational algorithms for clustering propositional data
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Data clustering has attracted a lot of research attention in the field of computational statistics and data mining. In most related studies, the dissimilarity between two clusters is defined as the distance between their centroids or the distance between two closest (or farthest) data points. However, all of these measures are vulnerable to outliers and removing the outliers precisely is yet another difficult task. In view of this, we propose a new similarity measure, referred to as cohesion, to measure the intercluster distances. By using this new measure of cohesion, we have designed a two-phase clustering algorithm, called cohesion-based self-merging (abbreviated as CSM), which runs in time linear to the size of input data set. Combining the features of partitional and hierarchical clustering methods, algorithm CSM partitions the input data set into several small subclusters in the first phase and then continuously merges the subclusters based on cohesion in a hierarchical manner in the second phase. The time and the space complexities of algorithm CSM are analyzed. As shown by our performance studies, the cohesion-based clustering is very robust and possesses excellent tolerance to outliers in various workloads. More importantly, algorithm CSM is shown to be able to cluster the data sets of arbitrary shapes very efficiently and provide better clustering results than those by prior methods.