Diagnosing extrapolation: tree-based density estimation

Authors:
Giles Hooker
Affiliations:
Stanford University, Stanford, CA
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 1
Cited 4

C4.5: programs for machine learning

C4.5: programs for machine learning

Density estimation trees

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Prediction-based regularization using data augmented regression

Statistics and Computing
Learned-loss boosting

Computational Statistics & Data Analysis
treeKL: A distance between high dimension empirical distributions

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has historically been very little concern with extrapolation in Machine Learning, yet extrapolation can be critical to diagnose. Predictor functions are almost always learned on a set of highly correlated data comprising a very small segment of predictor space. Moreover, flexible predictors, by their very nature, are not controlled at points of extrapolation. This becomes a problem for diagnostic tools that require evaluation on a product distribution. It is also an issue when we are trying to optimize a response over some variable in the input space. Finally, it can be a problem in non-static systems in which the underlying predictor distribution gradually drifts with time or when typographical errors misrecord the values of some predictors.We present a diagnosis for extrapolation as a statistical test for a point originating from the data distribution as opposed to a null hypothesis uniform distribution. This allows us to employ general classification methods for estimating such a test statistic. Further, we observe that CART can be modified to accept an exact distribution as an argument, providing a better classification tool which becomes our extrapolation-detection procedure. We explore some of the advantages of this approach and present examples of its practical application.