Algorithms for clustering data
Algorithms for clustering data
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
WebACE: a Web agent for document categorization and exploration
AGENTS '98 Proceedings of the second international conference on Autonomous agents
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Exploratory multilevel hot spot analysis: Australian taxation office case study
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Constrained locally weighted clustering
Proceedings of the VLDB Endowment
External validation measures for K-means clustering: A data distribution perspective
Expert Systems with Applications: An International Journal
Adapting the right measures for K-means clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
K-means clustering versus validation measures: a data-distribution perspective
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
COG: local decomposition for rare class analysis
Data Mining and Knowledge Discovery
Cluster analysis and fuzzy query in ship maintenance and design
ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
An integrated model for next page access prediction
International Journal of Knowledge and Web Intelligence
Role defining using behavior-based clustering in telecommunication network
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
K-means is a widely used partitional clustering method. While there are considerable research efforts to characterize the key features of K-means clustering, further investigation is needed to reveal whether and how the data distributions can have the impact on the performance of K-means clustering. Indeed, in this paper, we revisit the K-means clustering problem by answering three questions. First, how the "true" cluster sizes can make impact on the performance of K-means clustering? Second, is the entropy an algorithm-independent validation measure for K-means clustering? Finally, what is the distribution of the clustering results by K-means? To that end, we first illustrate that K-means tends to generate the clusters with the relatively uniform distribution on the cluster sizes. In addition, we show that the entropy measure, an external clustering validation measure, has the favorite on the clustering algorithms which tend to reduce high variation on the cluster sizes. Finally, our experimental results indicate that K-means tends to produce the clusters in which the variation of the cluster sizes, as measured by the Coefficient of Variation(CV), is in a specific range, approximately from 0.3 to 1.0.