Selective Sampling Using the Query by Committee Algorithm
Machine Learning
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Learning to rank: from pairwise approach to listwise approach
Proceedings of the 24th international conference on Machine learning
Active learning for ranking through expected loss optimization
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The limits of retrieval effectiveness
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hi-index | 0.00 |
Learning to rank has become a popular approach to build a ranking model for Web search recently. Based on our observation, the constitution of the training set will greatly influence the performance of the learned ranking model. Meanwhile, the number of queries in Web search is nearly infinite and the human labeling cost is expensive, hence a subset of queries need to be carefully selected for training. In this paper, we develop a greedy algorithm to sample the queries, by simultaneously taking the query density, difficulty and diversity into consideration. The experimental results on a collected Web search dataset comprising 2024 queries show that the proposed method can lead to a more informative training set for building an effective model.