Learning and evaluating classifiers under sample selection bias
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Reverse testing: an efficient framework to select amongst classifiers under sample selection bias
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Covariate Shift Adaptation by Importance Weighted Cross Validation
The Journal of Machine Learning Research
Context-sensitive queries for image retrieval in digital libraries
Journal of Intelligent Information Systems
Sample Selection Bias Correction Theory
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Pedestrian flow prediction in extensive road networks using biased observational data
Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Proceedings of the 18th ACM conference on Information and knowledge management
When efficient model averaging out-performs boosting and bagging
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Predicting concept changes using a committee of experts
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Proceedings of the 2013 International Conference on Software Engineering
Inferring the demographics of search users: social data meets search queries
Proceedings of the 22nd international conference on World Wide Web
Hi-index | 0.00 |
A recent paper categorizes classifier learning algorithms according to their sensitivity to a common type of sample selection bias where the chance of an example being selected into the training sample depends on its feature vector x but not (directly) on its class label y. A classifier learner is categorized as "local" if it is insensitive to this type of sample selection bias, otherwise, it is considered "global". In that paper, the true model is not clearly distinguished from the model that the algorithm outputs. In their discussion of Bayesian classifiers, logistic regression and hard-margin SVMs, the true model (or the model that generates the true class label for every example) is implicitly assumed to be contained in the model space of the learner, and the true class probabilities and model estimated class probabilities are assumed to asymptotically converge as the training data set size increases. However, in the discussion of naive Bayes, decision trees and soft-margin SVMs, the model space is assumed not to contain the true model, and these three algorithms are instead argued to be "global learners". We argue that most classifier learners may or may not be affected by sample selection bias; this depends on the dataset as well as the heuristics or inductive bias implied by the learning algorithm and their appropriateness to the particular dataset.