Sample Selection Bias Correction Theory

Authors:
Corinna Cortes;Mehryar Mohri;Michael Riley;Afshin Rostamizadeh
Affiliations:
Google Research, New York, NY 10011;Google Research, New York, NY 10011 and Courant Institute of Mathematical Sciences, New York, NY 10012;Google Research, New York, NY 10011;Courant Institute of Mathematical Sciences, New York, NY 10012
Venue:
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Year:
2008

Citing 11
Cited 7

Statistical analysis with missing data

Statistical analysis with missing data
Support-Vector Networks

Machine Learning
Algorithmic stability and sanity-check bounds for leave-one-out cross-validation

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Ridge Regression Learning Algorithm in Dual Variables

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On the influence of the kernel on the consistency of support vector machines

The Journal of Machine Learning Research
Stability and generalization

The Journal of Machine Learning Research
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Discriminative learning for differing training and test distributions

Proceedings of the 24th international conference on Machine learning
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Multiple kernel learning improved by MMD

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Preserving privacy in data mining via importance weighting

PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
A kernel two-sample test

The Journal of Machine Learning Research
On the hardness of domain adaptation and the utility of unlabeled target samples

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Ad click prediction: a view from the trenches

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Domain adaptation and sample bias correction theory and algorithm for regression

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effect of an error in that estimation on the accuracy of the hypothesis returned by the learning algorithm for two estimation techniques: a cluster-based estimation technique and kernel mean matching. We also report the results of sample bias correction experiments with several data sets using these techniques. Our analysis is based on the novel concept of distributional stabilitywhich generalizes the existing concept of point-based stability. Much of our work and proof techniques can be used to analyze other importance weighting techniques and their effect on accuracy when using a distributionally stable algorithm.