A risk minimization framework for domain adaptation

Authors:
Bo Long;Sudarshan Lamkhede;Srinivas Vadrevu;Ya Zhang;Belle Tseng
Affiliations:
Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 20
Cited 2

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Regularized multi--task learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to learn with the informative vector machine

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Logistic regression with an auxiliary data source

ICML '05 Proceedings of the 22nd international conference on Machine learning
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative learning for differing training and test distributions

Proceedings of the 24th international conference on Machine learning
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Learning a meta-level prior for feature relevance from multiple related tasks

Proceedings of the 24th international conference on Machine learning
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
A regression framework for learning ranking functions using relative relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
The class imbalance problem: A systematic study

Intelligent Data Analysis
Spectral domain-transfer learning

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Transfer learning from multiple source domains via consensus regularization

Proceedings of the 17th ACM conference on Information and knowledge management
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Cross-task knowledge-constrained self training

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mapping and revising Markov logic networks for transfer learning

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research

Ranking function adaptation with boosting trees

ACM Transactions on Information Systems (TOIS)
Flexible sample selection strategies for transfer learning in ranking

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised learning algorithms usually require high quality labeled training set of large volume. It is often expensive to obtain such labeled examples in every domain of an application. Domain adaptation aims to help in such cases by utilizing data available in related domains. However transferring knowledge from one domain to another is often non trivial due to different data distributions among the domains. Moreover, it is usually very hard to measure and formulate these distribution differences. Hence we introduce a new concept of label-relation function to transfer knowledge among different domains without explicitly formulating the data distribution differences. A novel learning framework, Domain Transfer Risk Minimization (DTRM), is proposed based on this concept. DTRM simultaneously minimizes the empirical risk for the target and the regularized empirical risk for source domain. Under this framework, we further derive a generic algorithm called Domain Adaptation by Label Relation (DALR) that is applicable to various applications in both classification and regression settings. DALR iteratively updates the target hypothesis function and outputs for the source domain until it converges. We provide an in-depth theoretical analysis of DTRM and establish fundamental error bounds. We also experimentally evaluate DALR on the task of ranking search results using real-world data. Our experimental results show that the proposed algorithm effectively and robustly utilizes data from source domains under various conditions: different sizes for source domain data; different noise levels for source domain data, and different difficulty levels for target domain data.