Generating pseudo test collections for learning to rank scientific articles

Authors:
Richard Berendsen;Manos Tsagkias;Maarten de Rijke;Edgar Meij
Affiliations:
ISLA, University of Amsterdam, Amsterdam, XH, The Netherlands;ISLA, University of Amsterdam, Amsterdam, XH, The Netherlands;ISLA, University of Amsterdam, Amsterdam, XH, The Netherlands;ISLA, University of Amsterdam, Amsterdam, XH, The Netherlands
Venue:
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Year:
2012

Citing 17
Cited 3

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Simulation of user judgments in bibliographic retrieval systems

SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Problems in the simulation of bibliographic retrieval systems

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Using titles and category names from editor-driven taxonomies for automatic evaluation

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Building simulated queries for known-item topics: an analysis using six european languages

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Quantifying query ambiguity

HLT '02 Proceedings of the second international conference on Human Language Technology Research
A comparison of statistical significance tests for information retrieval evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Retrieval experiments using pseudo-desktop collections

Proceedings of the 18th ACM conference on Information and knowledge management
The domain-specific track at CLEF 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Combined regression and ranking

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Validating query simulators: an experiment using commercial searches and purchases

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Pseudo test collections for learning web search ranking functions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Domain-specific track CLEF 2005: overview of results and approaches, remarks on the assessment analysis

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories

Expertise Retrieval

Foundations and Trends in Information Retrieval
Pseudo test collections for training and tuning microblog rankers

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Click model-based information retrieval metrics

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.