Learning and evaluating classifiers under sample selection bias

Authors:
Bianca Zadrozny
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 8
Cited 63

Statistical analysis with missing data

Statistical analysis with missing data
C4.5: programs for machine learning

C4.5: programs for machine learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Principles of data mining

Principles of data mining
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Estimating the Generalization Performance of an SVM Efficiently

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

A Bayesian network framework for reject inference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Logistic regression with an auxiliary data source

ICML '05 Proceedings of the 22nd international conference on Machine learning
An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Feature subset selection bias for classification learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Reverse testing: an efficient framework to select amongst classifiers under sample selection bias

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative learning for differing training and test distributions

Proceedings of the 24th international conference on Machine learning
Boosting for transfer learning

Proceedings of the 24th international conference on Machine learning
Making generative classifiers robust to selection bias

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Covariate Shift Adaptation by Importance Weighted Cross Validation

The Journal of Machine Learning Research
Topic-bridged PLSA for cross-domain text classification

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Spectral domain-transfer learning

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Domain Adaptation of Conditional Probability Models Via Feature Subsetting

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Bridged Refinement for Transfer Learning

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Sample Selection Bias Correction Theory

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
A batch ensemble approach to active learning with model selection

Neural Networks
Cross-Domain Knowledge Transfer Using Semi-supervised Classification

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Active learning for directed exploration of complex systems

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fair and balanced?: bias in bug-fix datasets

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Decision support and profit prediction for online auction sellers

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Transferring naive bayes classifiers for text classification

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Ranking model adaptation for domain-specific search

Proceedings of the 18th ACM conference on Information and knowledge management
Graph-based transfer learning

Proceedings of the 18th ACM conference on Information and knowledge management
Learning word sense disambiguation in biomedical text with difference between training and test distributions

Proceedings of the third international workshop on Data and text mining in bioinformatics
Dimensionality reduction for density ratio estimation in high-dimensional spaces

Neural Networks
Density Ratio Estimation: A New Versatile Tool for Machine Learning

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Coping with Distribution Change in the Same Domain Using Similarity-Based Instance Weighting

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
A Least-squares Approach to Direct Importance Estimation

The Journal of Machine Learning Research
Discriminative Learning Under Covariate Shift

The Journal of Machine Learning Research
Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

Journal of Intelligent Information Systems
Learning to rank only using training data from related domain

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Robust weighted kernel logistic regression in imbalanced and rare events data

Computational Statistics & Data Analysis
On the dynamic selection of biometric fusion algorithms

IEEE Transactions on Information Forensics and Security
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A robust semi-supervised classification method for transfer learning

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Online stratified sampling: evaluating classifiers at web-scale

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search

Neural Networks
Transfer learning via multi-view principal component analysis

Journal of Computer Science and Technology - Special issue on natural language processing
Preserving privacy in data mining via importance weighting

PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Query weighting for ranking model adaptation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Classification probabilistic PCA with application in domain adaptation

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Multi-view transfer learning with a large margin approach

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-task clustering via domain adaptation

Pattern Recognition
A unifying view on dataset shift in classification

Pattern Recognition
When efficient model averaging out-performs boosting and bagging

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
AUV-enabled adaptive underwater surveying for optimal data collection

Intelligent Service Robotics
Finding robust models using a stratified design

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Sentence-level instance-weighting for graph-based and transition-based dependency parsing

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Distance metric learning under covariate shift

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Design principles of massive, robust prediction systems

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating balanced classifier-independent training samples from unlabeled data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Learning word sense disambiguation in biomedical text with difference between training and test distributions

International Journal of Data Mining and Bioinformatics
Fairness-Aware classifier with prejudice remover regularizer

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Transfer spectral clustering

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Computational complexity of kernel-based density-ratio estimation: a condition number analysis

Machine Learning
Inferring the demographics of search users: social data meets search queries

Proceedings of the 22nd international conference on World Wide Web
Entity-centric document filtering: boosting feature mapping through meta-features

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning person-specific models for facial expression and action unit recognition

Pattern Recognition Letters
Instance selection and instance weighting for cross-domain sentiment classification via PU learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Democracy is good for ranking: towards multi-view rank learning and adaptation in web search

Proceedings of the 7th ACM international conference on Web search and data mining
Evaluation and aggregation of pay-as-you-drive insurance rate factors: A classification analysis approach

Decision Support Systems
Transfer learning with one-class data

Pattern Recognition Letters
Machine learning for targeted display advertising: transfer learning in action

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classifier learning methods commonly assume that the training data consist of randomly drawn examples from the same distribution as the test examples about which the learned model is expected to make predictions. In many practical situations, however, this assumption is violated, in a problem known in econometrics as sample selection bias. In this paper, we formalize the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classifier learning methods are affected by it. We also present a bias correction method that is particularly useful for classifier evaluation under sample selection bias.