Forming test collections with no system pooling

Authors:
Mark Sanderson;Hideo Joho
Affiliations:
University of Sheffield, Sheffield, UK;University of Sheffield, Sheffield, UK
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 15
Cited 31

An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Panel: building and using test collections

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language speech retrieval: establishing a baseline performance

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Image retrieval by hypertext links

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Extended Boolean information retrieval

Communications of the ACM
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation by highly relevant documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Pooling for a Large-Scale Test Collection: An Analysis of the Search Results from the First NTCIR Workshop

Information Retrieval
Corpora for topic detection and tracking

Topic detection and tracking
Building a filtering test collection for TREC 2002

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Incremental test collections

Proceedings of the 14th ACM international conference on Information and knowledge management
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical precision of information retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using the structure of overlap between search results to rank retrieval systems without relevance judgments

Information Processing and Management: an International Journal
On rank correlation in information retrieval evaluation

ACM SIGIR Forum
Robust test collections for retrieval evaluation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A new approach for evaluating query expansion: query-document term mismatch

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Problems with Kendall's tau

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Repeatable evaluation of search services in dynamic environments

ACM Transactions on Information Systems (TOIS)
Evaluating epistemic uncertainty under incomplete assessments

Information Processing and Management: an International Journal
Effect of OCR error correction on Arabic retrieval

Information Retrieval
Enabling the creation of domain-specific reference collections to support text-based information retrieval experiments in the architecture, engineering and construction industries

Advanced Engineering Informatics
A new rank correlation coefficient for information retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A Comparison of Interactive and Ad-Hoc Relevance Assessments

Focused Access to XML Documents
Comparing metrics across TREC and NTCIR: the robustness to system bias

Proceedings of the 17th ACM conference on Information and knowledge management
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Towards methods for the collective gathering and quality control of relevance assessments

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Including summaries in system evaluation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Weighted Rank Correlation in Information Retrieval Evaluation

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
A retrieval evaluation methodology for incomplete relevance assessments

ECIR'07 Proceedings of the 29th European conference on IR research
Annotations and digital libraries: designing adequate test-beds

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
CLEF 2009 ad hoc track overview: TEL and Persian tasks

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Research methodology in studies of assessor effort for information retrieval evaluation

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Boiling down information retrieval test collections

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Diagnostic Evaluation of Information Retrieval Models

ACM Transactions on Information Systems (TOIS)
Evaluation of information retrieval for E-discovery

Artificial Intelligence and Law
A social approach to context-aware retrieval

World Wide Web
GeoCLEF: the CLEF 2005 cross-language geographic information retrieval track overview

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
The future of large-scale evaluation campaigns for information retrieval in Europe

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A new statistical strategy for pooling: ELI

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Forming test collection relevance judgments from the pooled output of multiple retrieval systems has become the standard process for creating resources such as the TREC, CLEF, and NTCIR test collections. This paper presents a series of experiments examining three different ways of building test collections where no system pooling is used. First, a collection formation technique combining manual feedback and multiple systems is adapted to work with a single retrieval system. Second, an existing method based on pooling the output of multiple manual searches is re-examined: testing a wider range of searchers and retrieval systems than has been examined before. Third, a new approach is explored where the ranked output of a single automatic search on a single retrieval system is assessed for relevance: no pooling whatsoever. Using established techniques for evaluating the quality of relevance judgments, in all three cases, test collections are formed that are as good as TREC.