Fast and Robust General Purpose Clustering Algorithms

Authors:
V. Estivill-Castro;J. Yang
Affiliations:
School of Computing and Information Technology, Griffith University, Nathan, QLD 4111, Australia. v.estivill-castro@griffith.edu.au;School of Computing and Information Technology, University of Western Sydney, Campbelltown NSW 2560, Australia
Venue:
Data Mining and Knowledge Discovery
Year:
2004

Citing 13
Cited 7

Proving geometric algorithm non-solvability: An application of factoring polynomials

Journal of Symbolic Computation
Robust regression and outlier detection

Robust regression and outlier detection
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Learning from Data: Concepts, Theory, and Methods

Learning from Data: Concepts, Theory, and Methods
Data Mining Techniques: For Marketing, Sales, and Customer Support

Data Mining Techniques: For Marketing, Sales, and Customer Support
Why so many clustering algorithms: a position paper

ACM SIGKDD Explorations Newsletter
Concrete Math

Concrete Math
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering Associations in Spatial Data - An Efficient Medoid Based Approach

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Point Estimation Using the Kullback-Leibler Loss Function and MML

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining

Classification by clustering decision tree-like classifier based on adjusted clusters

Expert Systems with Applications: An International Journal
Classification by clustering decision tree-like classifier based on adjusted clusters

Expert Systems with Applications: An International Journal
Adjusting Fuzzy Similarity Functions for use with standard data mining tools

Journal of Systems and Software
Efficient spatial clustering algorithm using binary tree

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
A decision support method, based on bounded rationality concepts, to reveal feature saliency in clustering problems

Decision Support Systems
"Padding" bitmaps to support similarity and mining

Information Systems Frontiers
Clustering local frequency items in multiple databases

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose an algorithm that remains very efficient, generally applicable, multidimensional but is more robust to noise and outliers. We achieve this by using medians rather than means as estimators for the centers of clusters. Comparison with k-MEANS, EXPECTATION and MAXIMIZATION sampling demonstrates the advantages of our algorithm.