Workload sampling for enterprise search evaluation

Authors:
Tom Rowlands;David Hawking;Ramesh Sankaranarayana
Affiliations:
Australian National University and CSIRO Australia, Canberra, Australia;CSIRO, Canberra, Australia;Australian National University, Canberra, Australia
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 6
Cited 1

Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Learning user interaction models for predicting web search result preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Does topic metadata help with Web search?

Journal of the American Society for Information Science and Technology

Leveraging semantic technologies for enterprise search

Proceedings of the ACM first Ph.D. workshop in CIKM

Quantified Score

Hi-index	0.00

Visualization

Abstract

In real world use of test collection methods, it is essential that the query test set be representative of the work load expected in the actual application. Using a random sample of queries from a media company's query log as a 'gold standard' test set we demonstrate that biases in sitemap-derived and top n query sets can lead to significant perturbations in engine rankings and big differences in estimated performance levels.