Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Agglomerative clustering of a search engine query log
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
ACM SIGIR Forum
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Understanding user goals in web search
Proceedings of the 13th international conference on World Wide Web
Automatic identification of user goals in Web search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Detecting online commercial intention (OCI)
Proceedings of the 15th international conference on World Wide Web
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of web search engine query session and clicked documents
WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
A sober look at clustering stability
COLT'06 Proceedings of the 19th annual conference on Learning Theory
The intention behind web queries
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Applications of web query mining
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Likelihood based hierarchical clustering
IEEE Transactions on Signal Processing
Characterizing large-scale use of a direct manipulation application in the wild
Proceedings of Graphics Interface 2010
Improving document clustering using Okapi BM25 feature weighting
Information Retrieval
Hi-index | 0.00 |
Despite the wide applicability of clustering methods, their evaluation remains a problem. In this paper, we present a metric for the evaluation of clustering methods. The data set to be clustered is viewed as a sample from a larger population, with clustering quality measured in terms of our predicted ability to discriminate between members of this population. We measure this property by training a classifier to recognize each cluster and measuring the accuracy of this classifier, normalized by a notion of expected accuracy. To demonstrate the applicability of this metric we apply it to Web queries. We investigated a commercially oriented data set of 1700 queries and a general data set of 4000 queries. Both sets are taken from the logs of a commercial Web search engine. Clustering is based on the contents of search engine result pages generated by executing the queries on the search engine from which they were taken. Multiple clustering algorithms are crossed with various weighting schemes to produce multiple clusterings of each query set. Our metric is used evaluate these clusterings. The results on the commercially oriented data set are compared to two pre-existing manual labelings, and are also used in an ad clickthrough experiment.