Robust partitional clustering by outlier and density insensitive seeding

Authors:
Mohammad Al Hasan;Vineet Chaoji;Saeed Salem;Mohammed J. Zaki
Affiliations:
Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, United States;Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, United States;Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, United States;Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, United States
Venue:
Pattern Recognition Letters
Year:
2009

Citing 8
Cited 6

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Cluster center initialization algorithm for K-means clustering

Pattern Recognition Letters
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Top 10 algorithms in data mining

Knowledge and Information Systems
Least squares quantization in PCM

IEEE Transactions on Information Theory

Improving the performance of k-means for color quantization

Image and Vision Computing
A comparative study of efficient initialization methods for the k-means clustering algorithm

Expert Systems with Applications: An International Journal
Novelty detection using a new group outlier factor

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Online fuzzy medoid based clustering algorithms

Neurocomputing
Instability and cluster stability variance for real clusterings

Information Sciences: an International Journal

Quantified Score

Hi-index	0.10

Visualization

Abstract

The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality.