Constructing test collections by inferring document relevance via extracted relevant information

Authors:
Shahzad Rajput;Matthew Ekstrand-Abueg;Virgil Pavlu;Javed A. Aslam
Affiliations:
Northeastern University, Boston, MA, USA;Northeastern University, Boston, MA, USA;Northeastern University, Boston, MA, USA;Northeastern University, Boston, MA, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 27
Cited 3

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Term relevance feedback and query expansion: relation to design

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Question answering in TREC

Proceedings of the tenth international conference on Information and knowledge management
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Identifying and Filtering Near-Duplicate Documents

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Improving Algorithms for Boosting

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
Scaling IR-system evaluation using term relevance sets

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Automatically evaluating answers to definition questions

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Will pyramids built of nuggets topple over?

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
The effect of assessor error on IR system evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Evaluating diversified search results using per-intent graded relevance

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Click the search button and be happy: evaluating direct and immediate information access

Proceedings of the 20th ACM international conference on Information and knowledge management
IR system evaluation using nugget-based test collections

Proceedings of the fifth ACM international conference on Web search and data mining
On aggregating labels from multiple crowd workers to infer relevance of documents

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Summaries, ranked retrieval and sessions: a unified framework for information access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Exploring semi-automatic nugget extraction for Japanese one click access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Live nuggets extractor: a semi-automated system for text extraction and test collection creation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of a typical information retrieval system is to satisfy a user's information need---e.g., by providing an answer or information "nugget"---while the actual search space of a typical information retrieval system consists of documents---i.e., collections of nuggets. In this paper, we characterize this relationship between nuggets and documents and discuss applications to system evaluation. In particular, for the problem of test collection construction for IR system evaluation, we demonstrate a highly efficient algorithm for simultaneously obtaining both relevant documents and relevant information. Our technique exploits the mutually reinforcing relationship between relevant documents and relevant information, yielding document-based test collections whose efficiency and efficacy exceed those of typical Cranfield-style test collections, while also generating sets of highly relevant information.