Proving geometric algorithm non-solvability: An application of factoring polynomials
Journal of Symbolic Computation
Robust regression and outlier detection
Robust regression and outlier detection
Algorithms
Pattern recognition: statistical, structural and neural approaches
Pattern recognition: statistical, structural and neural approaches
Learning from Data: Concepts, Theory, and Methods
Learning from Data: Concepts, Theory, and Methods
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Geometric optimization and computational complexity (algebra, algorithms, robotics)
Geometric optimization and computational complexity (algebra, algorithms, robotics)
Why so many clustering algorithms: a position paper
ACM SIGKDD Explorations Newsletter
Non-crisp Clustering by Fast, Convergent, and Robust Algorithms
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Randomized Algorithms for Robust Estimation of Location
TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
Data Structures for Minimization of Total Within-Group Distance for Spatio-temporal Clustering
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Comparing language similarity across genetic and typologically-based groupings
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
APSCAN: A parameter free algorithm for clustering
Pattern Recognition Letters
Determining the number of clusters using information entropy for mixed data
Pattern Recognition
Hi-index | 0.00 |
General purpose and highly applicable clustering methods are required for knowledge discovery. K-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, K-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with K-MEANS, EM and GiBBS sampling demonstrates the advantages of our algorithms.