Y-Means: an autonomous clustering algorithm

Authors:
Ali A. Ghorbani;Iosif-Viorel Onut
Affiliations:
Faculty of Computer Science, University of New Brunswick, Fredericton, Canada;Faculty of Computer Science, University of New Brunswick, Fredericton, Canada
Venue:
HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I
Year:
2010

Citing 7
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Support-Vector Networks

Machine Learning
Self-organizing maps

Self-organizing maps
Data mining: concepts and techniques

Data mining: concepts and techniques
Induction of Decision Trees

Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Network Intrusion Detection Using an Improved Competitive Learning Neural Network

CNSR '04 Proceedings of the Second Annual Conference on Communication Networks and Services Research

Interest-based real-time content recommendation in online social communities

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an unsupervised clustering technique for data classification based on the K-means algorithm The K-means algorithm is well known for its simplicity and low time complexity However, the algorithm has three main drawbacks: dependency on the initial centroids, dependency on the number of clusters, and degeneracy Our solution accommodates these three issues, by proposing an approach to automatically detect a semi-optimal number of clusters according to the statistical nature of the data As a side effect, the method also makes choices of the initial centroid-seeds not critical to the clustering results The experimental results show the robustness of the Y-means algorithm as well as its good performance against a set of other well known unsupervised clustering techniques Furthermore, we study the performance of our proposed solution against different distance and outlier-detection functions and recommend the best combinations.