Cross validation framework to choose amongst models and datasets for transfer learning

Authors:
Erheng Zhong;Wei Fan;Qiang Yang;Olivier Verscheure;Jiangtao Ren
Affiliations:
Sun Yat-Sen University, Guangzhou, China;IBM T.J Watson Research;Department of Computer Science, Hong Kong University of Science and Technology;IBM T.J Watson Research;Sun Yat-Sen University, Guangzhou, China
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Year:
2010

Citing 8
Cited 3

RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Reverse testing: an efficient framework to select amongst classifiers under sample selection bias

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Covariate Shift Adaptation by Importance Weighted Cross Validation

The Journal of Machine Learning Research
Knowledge transfer via multiple model local structure mapping

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Latent space domain transfer between high dimensional overlapping distributions

Proceedings of the 18th international conference on World wide web
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Metalearning: Applications to Data Mining

Metalearning: Applications to Data Mining

On the usefulness of similarity based projection spaces for transfer learning

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
ComSoc: adaptive transfer of user behaviors over composite social network

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
User behavior learning and transfer in composite social networks

ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue

Quantified Score

Hi-index	0.00

Visualization

Abstract

One solution to the lack of label problem is to exploit transfer learning, whereby one acquires knowledge from source-domains to improve the learning performance in the target-domain. The main challenge is that the source and target domains may have different distributions. An open problem is how to select the available models (including algorithms and parameters) and importantly, abundance of source-domain data, through statistically reliable methods, thus making transfer learning practical and easy-to-use for real-world applications. To address this challenge, one needs to take into account the difference in both marginal and conditional distributions in the same time, but not just one of them. In this paper, we formulate a new criterion to overcome "double" distribution shift and present a practical approach "Transfer Cross Validation" (TrCV) to select both models and data in a cross validation framework, optimized for transfer learning. The idea is to use density ratio weighting to overcome the difference in marginal distributions and propose a "reverse validation" procedure to quantify how well a model approximates the true conditional distribution of target-domain. The usefulness of TrCV is demonstrated on different cross-domain tasks, including wine quality evaluation, web-user ranking and text categorization. The experiment results show that the proposed method outperforms both traditional cross-validation and one state-of-the-art method which only considers marginal distribution shift. The software and datasets are available from the authors.