Parallel algorithms for hierarchical clustering
Parallel Computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
ACM Computing Surveys (CSUR)
Clustering Algorithms
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
Fast hierarchical clustering and its validation
Data & Knowledge Engineering
Comparison of Four Initialization Techniques for the K -Medians Clustering Algorithm
Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
IEEE Transactions on Knowledge and Data Engineering
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
Knowledge and Information Systems
Data Clustering: User's Dilemma
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
A multi-prototype clustering algorithm
Pattern Recognition
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets
Pattern Recognition Letters
Fast Single-Link Clustering Method Based on Tolerance Rough Set Model
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Speeding-Up hierarchical agglomerative clustering in presence of expensive metrics
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Speeding-up the kernel k-means clustering method: A prototype based hybrid approach
Pattern Recognition Letters
Facial expressions analysis based on cooperative neuro-computing interactions
IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
A size-insensitive integrity-based fuzzy c-means method for data clustering
Pattern Recognition
Hi-index | 0.01 |
Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O(n^2), where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets.