A theory of learning from different domains

Authors:
Shai Ben-David;John Blitzer;Koby Crammer;Alex Kulesza;Fernando Pereira;Jennifer Wortman Vaughan
Affiliations:
David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada;Department of Computer Science, UC Berkeley, Berkeley, USA;Department of Electrical Engineering, The Technion, Haifa, Israel;Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA;Google Research, Mountain View, USA;School of Engineering and Applied Sciences, Harvard University, Cambridge, USA
Venue:
Machine Learning
Year:
2010

Citing 0
Cited 20

Learning to rank only using training data from related domain

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Open-domain semantic role labeling by modeling word spans

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
We're not in Kansas anymore: detecting domain changes in streams

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Domain adaptation by constraining inter-domain variability of latent feature representation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Query weighting for ranking model adaptation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Piggyback: using search engines for robust cross-domain named entity recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Language models as representations for weakly-supervised NLP tasks

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Information, Divergence and Risk for Binary Experiments

The Journal of Machine Learning Research
Extracting explicit and implicit causal relations from sparse, domain-specific texts

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
On the usefulness of similarity based projection spaces for transfer learning

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Virtual worlds and active learning for human detection

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Privileged information for data clustering

Information Sciences: an International Journal
Domain adaptation with ensemble of feature groups

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Batch mode active sampling based on marginal probability distribution matching

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Multisource domain adaptation and its application to early detection of fatigue

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Linking named entities to any database

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Multi-domain learning: when do domains matter?

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Biased representation learning for domain adaptation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Domain adaptive dictionary learning

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
On the hardness of domain adaptation and the utility of unlabeled target samples

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.04

Visualization

Abstract

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time?We address the first question by bounding a classifier's target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier.We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors.