Data partition methodology for validation of predictive models

  • Authors:
  • Rebecca E. Morrison;Corey M. Bryant;Gabriel Terejanu;Serge Prudhomme;Kenji Miki

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • Computers & Mathematics with Applications
  • Year:
  • 2013

Quantified Score

Hi-index 0.09

Visualization

Abstract

In many cases, model validation requires that legacy data be partitioned into calibration and validation sets, but how to do so is a nontrivial and open area of research. We present a systematic procedure to partition the data, adapted from cross-validation and in the context of predictive modeling. By considering all possible partitions, we proceed with post-processing steps to find the optimal partition of the data subject to given constraints. We are concerned here with mathematical models of physical systems whose predictions of a given unobservable quantity of interest are the basis for critical decisions. Thus, the proposed approach addresses two critical issues: (1) that the model be evaluated with respect to its ability to reproduce the data and (2) that the model be highly challenged by the validation set with respect to predictions of the quantity of interest. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler's and decision-maker's requirements. The framework is general and may be applied to a wide range of problems. It is illustrated here through an example using generated experiments of a nonlinear one degree-of-freedom oscillator.