Transferring knowledge with source selection to learn IR functions on unlabeled collections

Authors:
Parantapa Goswami;Massih R. Amini;Eric Gaussier
Affiliations:
Université Joseph Fourier, Grenoble, France;Université Joseph Fourier, Grenoble, France;Université Joseph Fourier, Grenoble, France
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 12
Cited 0

Making large-scale support vector machine learning practical

Advances in kernel methods
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Adapting ranking SVM to document retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The Probabilistic Relevance Framework: BM25 and Beyond

Foundations and Trends in Information Retrieval
Knowledge transfer for cross domain learning to rank

Information Retrieval
Learning to rank only using training data from related domain

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Information-based models for ad hoc IR

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Query weighting for ranking model adaptation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the problem of learning an IR function on a collection without relevance judgements (called target collection) by transferring knowledge from a selected source collection with relevance judgements. To do so, we first construct, for each query in the target collection, relative relevance judgment pairs using information from the source collection closest to the query (selection and transfer steps), and then learn an IR function from the obtained pairs in the target collection (self-learning step). For the transfer step, the relevance information in the source collection is summarized as a grid that provides, for each term frequency and document frequency values of a word in a document, an empirical estimate of the relevance of the document. The self-learning step iteratively assigns pairwise preferences to documents in the target collection using the scores of the former learned function. We show the effectiveness of our approach through a series of extensive experiments on CLEF and several collections from TREC used either as target or source datasets. Our experiments show the importance of selecting the source collection prior to transfer information to the target collection, and demonstrate that the proposed approach yields results consistently and significantly above state-of-the-art IR functions.