Fast and robust general purpose clustering algorithms

Authors:
Vladimir Estivill-Castro;Jianhua Yang
Affiliations:
Department of Computer Science & Software Engineering, The University of Newcastle, Callaghan, NSW, Australia;Department of Computer Science & Software Engineering, The University of Newcastle, Callaghan, NSW, Australia
Venue:
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Year:
2000

Citing 7
Cited 7

Proving geometric algorithm non-solvability: An application of factoring polynomials

Journal of Symbolic Computation
Robust regression and outlier detection

Robust regression and outlier detection
Algorithms

Algorithms
Pattern recognition: statistical, structural and neural approaches

Pattern recognition: statistical, structural and neural approaches
Learning from Data: Concepts, Theory, and Methods

Learning from Data: Concepts, Theory, and Methods
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Geometric optimization and computational complexity (algebra, algorithms, robotics)

Geometric optimization and computational complexity (algebra, algorithms, robotics)

Why so many clustering algorithms: a position paper

ACM SIGKDD Explorations Newsletter
Non-crisp Clustering by Fast, Convergent, and Robust Algorithms

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Randomized Algorithms for Robust Estimation of Location

TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
Data Structures for Minimization of Total Within-Group Distance for Spatio-temporal Clustering

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Comparing language similarity across genetic and typologically-based groupings

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
APSCAN: A parameter free algorithm for clustering

Pattern Recognition Letters
Determining the number of clusters using information entropy for mixed data

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

General purpose and highly applicable clustering methods are required for knowledge discovery. K-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, K-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with K-MEANS, EM and GiBBS sampling demonstrates the advantages of our algorithms.