An evaluation of criteria for measuring the quality of clusters

Authors:
Bhavani Raskutti;Christopher Leckie
Affiliations:
Telstra Research Laboratories, Clayton, Victoria, Australia;Telstra Research Laboratories, Clayton, Victoria, Australia
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 3
Cited 6

Clustering algorithms

Information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

DIWeDa - Detecting Intrusions in Web Databases

Proceeedings of the 22nd annual IFIP WG 11.3 working conference on Data and Applications Security
Exploratory modeling with collaborative design spaces

ACM SIGGRAPH Asia 2009 papers
On determining the optimal partition in agglomerative clustering of documents

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Word clustering with validity indices

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
A unifying criterion for unsupervised clustering and feature selection

Pattern Recognition
Estimating the predominant number of clusters in a dataset

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important problem in clustering is how to decide what is the best set of clusters for a given data set, in terms of both the number of clusters and the membership of those clusters. In this paper we develop four criteria for measuring the quality of different sets of clusters. These criteria are designed so that different criteria prefer cluster sets that generalise at different levels of granularity. We evaluate the suitability of these criteria for non-hierarchical clustering of the results returned by a search engine. We also compare the number of clusters chosen by these criteria with the number of clusters chosen by a group of human subjects. Our results demonstrate that our criteria match the variability exhibited by human subjects, indicating there is no single perfect criterion. Instead, it is necessary to select the correct criterion to match a human subject's generalisation needs.