Data partition methodology for validation of predictive models

Authors:
Rebecca E. Morrison;Corey M. Bryant;Gabriel Terejanu;Serge Prudhomme;Kenji Miki
Affiliations:
-;-;-;-;-
Venue:
Computers & Mathematics with Applications
Year:
2013

Citing 4
Cited 0

Bayesian model assessment and comparison using cross-validation predictive densities

Neural Computation
Introduction to Bayesian Scientific Computing: Ten Lectures on Subjective Computing (Surveys and Tutorials in the Applied Mathematical Sciences)

Introduction to Bayesian Scientific Computing: Ten Lectures on Subjective Computing (Surveys and Tutorials in the Applied Mathematical Sciences)
Verification and Validation in Scientific Computing

Verification and Validation in Scientific Computing
The parallel c++ statistical library 'QUESO': quantification of uncertainty for estimation, simulation and optimization

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing

Quantified Score

Hi-index	0.09

Visualization

Abstract

In many cases, model validation requires that legacy data be partitioned into calibration and validation sets, but how to do so is a nontrivial and open area of research. We present a systematic procedure to partition the data, adapted from cross-validation and in the context of predictive modeling. By considering all possible partitions, we proceed with post-processing steps to find the optimal partition of the data subject to given constraints. We are concerned here with mathematical models of physical systems whose predictions of a given unobservable quantity of interest are the basis for critical decisions. Thus, the proposed approach addresses two critical issues: (1) that the model be evaluated with respect to its ability to reproduce the data and (2) that the model be highly challenged by the validation set with respect to predictions of the quantity of interest. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler's and decision-maker's requirements. The framework is general and may be applied to a wide range of problems. It is illustrated here through an example using generated experiments of a nonlinear one degree-of-freedom oscillator.