Information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
DIWeDa - Detecting Intrusions in Web Databases
Proceeedings of the 22nd annual IFIP WG 11.3 working conference on Data and Applications Security
Exploratory modeling with collaborative design spaces
ACM SIGGRAPH Asia 2009 papers
On determining the optimal partition in agglomerative clustering of documents
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Word clustering with validity indices
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
A unifying criterion for unsupervised clustering and feature selection
Pattern Recognition
Estimating the predominant number of clusters in a dataset
Intelligent Data Analysis
Hi-index | 0.00 |
An important problem in clustering is how to decide what is the best set of clusters for a given data set, in terms of both the number of clusters and the membership of those clusters. In this paper we develop four criteria for measuring the quality of different sets of clusters. These criteria are designed so that different criteria prefer cluster sets that generalise at different levels of granularity. We evaluate the suitability of these criteria for non-hierarchical clustering of the results returned by a search engine. We also compare the number of clusters chosen by these criteria with the number of clusters chosen by a group of human subjects. Our results demonstrate that our criteria match the variability exhibited by human subjects, indicating there is no single perfect criterion. Instead, it is necessary to select the correct criterion to match a human subject's generalisation needs.