An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias

Authors:
Wei Fan;Ian Davidson;Bianca Zadrozny;Philip S. Yu
Affiliations:
IBM T. J. Watson Research;State University of New York at Albany;IBM T. J. Watson Research;IBM T. J. Watson Research
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 1
Cited 10

Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Reverse testing: an efficient framework to select amongst classifiers under sample selection bias

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Covariate Shift Adaptation by Importance Weighted Cross Validation

The Journal of Machine Learning Research
Context-sensitive queries for image retrieval in digital libraries

Journal of Intelligent Information Systems
Sample Selection Bias Correction Theory

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Pedestrian flow prediction in extensive road networks using biased observational data

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Graph-based transfer learning

Proceedings of the 18th ACM conference on Information and knowledge management
When efficient model averaging out-performs boosting and bagging

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Predicting concept changes using a committee of experts

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
Inferring the demographics of search users: social data meets search queries

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

A recent paper categorizes classifier learning algorithms according to their sensitivity to a common type of sample selection bias where the chance of an example being selected into the training sample depends on its feature vector x but not (directly) on its class label y. A classifier learner is categorized as "local" if it is insensitive to this type of sample selection bias, otherwise, it is considered "global". In that paper, the true model is not clearly distinguished from the model that the algorithm outputs. In their discussion of Bayesian classifiers, logistic regression and hard-margin SVMs, the true model (or the model that generates the true class label for every example) is implicitly assumed to be contained in the model space of the learner, and the true class probabilities and model estimated class probabilities are assumed to asymptotically converge as the training data set size increases. However, in the discussion of naive Bayes, decision trees and soft-margin SVMs, the model space is assumed not to contain the true model, and these three algorithms are instead argued to be "global learners". We argue that most classifier learners may or may not be affected by sample selection bias; this depends on the dataset as well as the heuristics or inductive bias implied by the learning algorithm and their appropriateness to the particular dataset.