Performance measures, consistency, and power for artificial neural network models

  • Authors:
  • J. M. Twomey;A. E. Smith

  • Affiliations:
  • Department of Industrial Engineering, University of Pittsburgh 1031 Benedum Hall, Pittsburgh, PA 15261, U.S.A.;Department of Industrial Engineering, University of Pittsburgh 1031 Benedum Hall, Pittsburgh, PA 15261, U.S.A.

  • Venue:
  • Mathematical and Computer Modelling: An International Journal
  • Year:
  • 1995

Quantified Score

Hi-index 0.98

Visualization

Abstract

Model building in artificial neural networks (ANN) refers to selecting the ''optimal'' network architecture, network topology, data representation, training algorithm, training parameters, and terminating criteria, such that some desired level of performance is achieved. Validation, a critical aspect of any model construction, is based upon some specified ANN performance measure of data that was not used in model construction. In addition to trained ANN validation, this performance measure is often used to evaluate the superiority of network architecture, learning algorithm, or application of a neural network. This paper investigates the three most frequently reported performance measures for pattern classification networks: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and percent good classification. First the inconsistency of the three metrics for selecting the ''better'' network is examined empirically. An analysis of error histograms is shown to be an effective means for investigating and resolving inconsistent network performance measures. Second, the focus of this paper is on percent good classification, the most often used measure of performance for classification networks. This measure is satisfactory if no particular importance is given to any single class, however, if one class is deemed more serious than others, percent good classification will mask the individual class components. This deficiency is resolved through a neural network analogy to the statistical concept of power. It is shown that power as a neural network performance metric is tuneable, and is a more descriptive measure than percent correct for evaluating and predicting the ''goodness'' of a network.