A distance based clustering method for arbitrary shaped clusters in large datasets

Authors:
Bidyut Kr. Patra;Sukumar Nandi;P. Viswanath
Affiliations:
Department of Computer Science and Engineering, Indian Institute of Technology - Guwahati, Guwahati 781039, India;Department of Computer Science and Engineering, Indian Institute of Technology - Guwahati, Guwahati 781039, India;Department of Computer Science and Engineering, Rajeev Gandhi Memorial College of Engineering & Technology, Nandyal 518501, A.P., India
Venue:
Pattern Recognition
Year:
2011

Citing 22
Cited 4

Parallel algorithms for hierarchical clustering

Parallel Computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Clustering Algorithms

Clustering Algorithms
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
CLARANS: A Method for Clustering Objects for Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
Fast hierarchical clustering and its validation

Data & Knowledge Engineering
Comparison of Four Initialization Techniques for the K -Medians Clustering Algorithm

Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging

IEEE Transactions on Knowledge and Data Engineering
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Efficient bottom-up hybrid hierarchical clustering techniques for protein sequence classification

Pattern Recognition
Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Knowledge and Information Systems
Data Clustering: User's Dilemma

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
A multi-prototype clustering algorithm

Pattern Recognition
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

Pattern Recognition Letters
Fast Single-Link Clustering Method Based on Tolerance Rough Set Model

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Speeding-Up hierarchical agglomerative clustering in presence of expensive metrics

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Survey of clustering algorithms

IEEE Transactions on Neural Networks

An efficient approach for unsupervised fuzzy clustering based on grouping evolution strategies

Pattern Recognition
Speeding-up the kernel k-means clustering method: A prototype based hybrid approach

Pattern Recognition Letters
Facial expressions analysis based on cooperative neuro-computing interactions

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
A size-insensitive integrity-based fuzzy c-means method for data clustering

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O(n^2), where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets.