Rule-based active sampling for learning to rank

Authors:
Rodrigo Silva;Marcos A. Gonçalves;Adriano Veloso
Affiliations:
Department of Computer Science, Federal University of Minas Gerais;Department of Computer Science, Federal University of Minas Gerais;Department of Computer Science, Federal University of Minas Gerais
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Year:
2011

Citing 18
Cited 5

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Large test collection experiments on an operational, interactive system: Okapi at TREC

TREC-2 Proceedings of the second conference on Text retrieval conference
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
SVM selective sampling for ranking with application to data retrieval

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Optimizing estimated loss reduction for active sampling in rank learning

Proceedings of the 25th international conference on Machine learning
Learning to rank at query-time using association rules

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Dual Strategy Active Learning

ECML '07 Proceedings of the 18th European conference on Machine Learning
Hinge Rank Loss and the Area Under the ROC Curve

ECML '07 Proceedings of the 18th European conference on Machine Learning
Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Learning to efficiently rank

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Active learning for ranking through expected loss optimization

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
LETOR: A benchmark collection for research on learning to rank for information retrieval

Information Retrieval
Unsupervised discretization using tree-based density estimation

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Variance maximization via noise injection for active sampling in learning to rank

Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic vandalism detection in wikipedia with active associative classification

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Automatic vandalism detection in wikipedia with active associative classification

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Improving on-demand learning to rank through parallelism

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning to rank (L2R) algorithms rely on a labeled training set to generate a ranking model that can be later used to rank new query results. Producing these labeled training sets is usually very costly as it requires human annotators to assess the relevance or order the elements in the training set. Recently, active learning alternatives have been proposed to reduce the labeling effort by selectively sampling an unlabeled set. In this paper we propose a novel rule-based active sampling method for Learning to Rank. Our method actively samples an unlabeled set, selecting new documents to be labeled based on how many relevance inference rules they generate given the previously selected and labeled examples. The smaller the number of generated rules, the more dissimilar and more "informative" is a document with regard to the current state of the labeled set. Differently from previous solutions, our algorithm does not rely on an initial training seed and can be directly applied to an unlabeled dataset. Also in contrast to previous work, we have a clear stop criterion and do not need to empirically discover the best configuration by running a number of iterations on the validation or test sets. These characteristics make our algorithm highly practical. We demonstrate the effectiveness of our active sampling method on several benchmarking datasets, showing that a significant reduction in training size is possible. Our method selects as little as 1.1% and at most 2.2% of the original training sets, while providing competitive results when compared to state-of-the-art supervised L2R algorithms that use the complete training sets.