Document selection methodologies for efficient and effective learning-to-rank

Authors:
Javed A. Aslam;Evangelos Kanoulas;Virgil Pavlu;Stefan Savev;Emine Yilmaz
Affiliations:
Northeastern University, Boston, MA, USA;Northeastern University, Boston, MA, USA;Northeastern University, Boston, MA, USA;Northeastern University, Boston, MA, USA;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 13
Cited 17

How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
A support vector method for multivariate performance measures

ICML '05 Proceedings of the 22nd international conference on Machine learning
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Optimisation methods for ranking functions with multiple parameters

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Evaluation over thousands of queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Active learning for ranking through expected loss optimization

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Fast query execution for retrieval models based on path-constrained random walks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
LIP6 at INEX'09: OWPC for ad hoc track

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Learning to rank for why-question answering

Information Retrieval
The importance of the depth for text-image selection strategy in learning-to-rank

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
A large-scale study of the effect of training set characteristics over learning-to-rank algorithms

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
LIP6 at INEX'10: OWPC for ad hoc track

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Query sampling for learning data fusion

Proceedings of the 20th ACM international conference on Information and knowledge management
Semi-supervised learning to rank with preference regularization

Proceedings of the 20th ACM international conference on Information and knowledge management
Relevance feedback exploiting query-specific document manifolds

Proceedings of the 20th ACM international conference on Information and knowledge management
Heterogeneous domain adaptation using manifold alignment

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Ordinal preserving projection: a novel dimensionality reduction method for image ranking

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
A Learning to Rank framework applied to text-image retrieval

Multimedia Tools and Applications
Active query selection for learning rankers

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Ordinal regularized manifold feature extraction for image ranking

Signal Processing
Two-Stage learning to rank for information retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
The whens and hows of learning to rank for web search

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning-to-rank has attracted great attention in the IR community. Much thought and research has been placed on query-document feature extraction and development of sophisticated learning-to-rank algorithms. However, relatively little research has been conducted on selecting documents for learning-to-rank data sets nor on the effect of these choices on the efficiency and effectiveness of learning-to-rank algorithms. In this paper, we employ a number of document selection methodologies, widely used in the context of evaluation--depth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learning-to-rank. We compare them with the document selection methodology used to create the LETOR datasets. Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections.