Dynamic test collections: measuring search effectiveness on the live web

Authors:
Ian Soboroff
Affiliations:
National Institute of Standards and Technology, Gaithersburg, MD
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 16
Cited 6

Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How dynamic is the Web?

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Collection statistics for fast duplicate document detection

ACM Transactions on Information Systems (TOIS)
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Building a filtering test collection for TREC 2002

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Sic transit gloria telae: towards an understanding of the web's decay

Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages

Software—Practice & Experience - Special issue: Web technologies
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
On evaluating web search with very few relevant documents

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental test collections

Proceedings of the 14th ACM international conference on Information and knowledge management

Robust test collections for retrieval evaluation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Repeatable evaluation of search services in dynamic environments

ACM Transactions on Information Systems (TOIS)
Evaluating epistemic uncertainty under incomplete assessments

Information Processing and Management: an International Journal
Automated functional testing of online search services

Software Testing, Verification & Reliability
Differences in effectiveness across sub-collections

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing methods for measuring the quality of search algorithms use a static collection of documents. A set of queries and a mapping from the queries to the relevant documents allow the experimenter to see how well different search engines or engine configurations retrieve the correct answers. This methodology assumes that the document set and thus the set of relevant documents are unchanging. In this paper, we abandon the static collection requirement. We begin with a recent TREC collection created from a web crawl and analyze how the documents in that collection have changed over time. We determine how decay of the document collection affects TREC systems, and present the results of an experiment using the decayed collection to measure a live web search system. We employ novel measures of search effectiveness that are robust despite incomplete relevance information. Lastly, we propose a methodology of "collection maintenance" which supports measuring search performance both for a single system and between systems run at different points in time.