Inferring probability of relevance using the method of logistic regression
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
Learning user interaction models for predicting web search result preferences
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Reliable information retrieval evaluation with incomplete and biased judgements
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A semantic approach to contextual advertising
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic feature selection in the markov random field model for information retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semiautomatic evaluation of retrieval systems using document similarities
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Finding keyword from online broadcasting content for targeted advertising
Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Learning to rank with partially-labeled data
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploring mouse movements for inferring query intent
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generating succinct titles for web URLs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Crowdsourcing for relevance evaluation
ACM SIGIR Forum
Quicklink selection for navigational query results
Proceedings of the 18th international conference on World wide web
Semi-supervised document retrieval
Information Processing and Management: an International Journal
Low-cost and robust evaluation of information retrieval systems
Low-cost and robust evaluation of information retrieval systems
Building enriched document representations using aggregated anchor text
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval
Foundations and Trends in Information Retrieval
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
A Boosting Approach for Learning to Rank Using SVD with Partially Labeled Data
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Query reformulation using anchor text
Proceedings of the third ACM international conference on Web search and data mining
Measuring the reusability of test collections
Proceedings of the third ACM international conference on Web search and data mining
Relevance and ranking in online dating systems
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Reusable test collections through experimental design
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Generating pseudo test collections for learning to rank scientific articles
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Retrieval of Web Pages on Real-World Events related to Physical Objects
International Journal of Information Retrieval Research
Pseudo test collections for training and tuning microblog rankers
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Test collections are the primary drivers of progress in information retrieval. They provide yardsticks for assessing the effectiveness of ranking functions in an automatic, rapid, and repeatable fashion and serve as training data for learning to rank models. However, manual construction of test collections tends to be slow, labor-intensive, and expensive. This paper examines the feasibility of constructing web search test collections in a completely unsupervised manner given only a large web corpus as input. Within our proposed framework, anchor text extracted from the web graph is treated as a pseudo query log from which pseudo queries are sampled. For each pseudo query, a set of relevant and non-relevant documents are selected using a variety of web-specific features, including spam and aggregated anchor text weights. The automatically mined queries and judgments form a pseudo test collection that can be used for training ranking functions. Experiments carried out on TREC web track data show that learning to rank models trained using pseudo test collections outperform an unsupervised ranking function and are statistically indistinguishable from a model trained using manual judgments, demonstrating the usefulness of our approach in extracting reasonable quality training data "for free".