Learning to Predict Salient Regions from Disjoint and Skewed Training Sets

  • Authors:
  • Larry Shoemaker;Robert E. Banfield;Lawrence O. Hall;Kevin W. Bowyer;W. Philip Kegelmeyer

  • Affiliations:
  • University of South Florida, USA;University of South Florida, USA;University of South Florida, USA;University of Notre Dame, USA;Sandia National Laboratories, USA

  • Venue:
  • ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an ensemble learning approach that achieves accurate predictions from arbitrarily partitioned data. The partitions come from the distributed processing requirements of a large scale simulation where the volume of the data is such that classifiers can train only on data local to a given partition. As a result of the partition reflecting the need for efficient simulation analysis, rather than the needs of data mining, the class statistics vary across partitions; indeed some classes will likely be absent from some partitions. We combine a fast ensemble learning algorithm with majority voting to generate an accurate working model of the simulation. Results from several simulations show that regions of interest are successfully identified in spite of training set class imbalances. Accuracy is analyzed both at the level of nodes in the simulation data structure, and in terms of higher-level regions of interest. It is shown that over 98% of salient regions are found in independent test sets. Hence, this approach will be a significant time saver for simulation users and developers.