Finding robust models using a stratified design

Authors:
Rohan A. Baxter
Affiliations:
Analytics Project, Office of the Chief Knowledge Officer, Australian Taxation Office, ACT
Venue:
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Year:
2006

Citing 4
Cited 1

Data mining: concepts and techniques

Data mining: concepts and techniques
Principles of data mining

Principles of data mining
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Data Mining

Data Mining

Predictive model of insolvency risk for Australian corporations

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predictive performance in model selection is often estimated using out-of-sample validation and test datasets. The assumption is that the test and validation datasets are from the same population as the training dataset. This assumption may not apply in the common application context where the model is applied to scoring of future data. This paper proposes a sample design which can lead to better model performance and robust estimates of model generalization error. The sample design is shown applied to a collection scoring application.