On using prototype reduction schemes to enhance the computation of volume-based inter-class overlap measures

  • Authors:
  • Sang-Woon Kim;B. John Oommen

  • Affiliations:
  • Department of Computer Science and Engineering, Myongji University, Yongin 449-728, Republic of Korea;School of Computer Science, Carleton University, Ottawa, Canada K1S 5B6 and University of Agder, Grimstad, Norway

  • Venue:
  • Pattern Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In most pattern recognition (PR) applications, it is advantageous if the accuracy (or error rate) of the classifier can be evaluated or bounded prior to testing it in a real-life setting. It is also well known that if the two class-conditional distributions have a large overlapping volume (almost all the available work on ''overlapping of classes'' deals with the case when there are only two classes), the classification accuracy is poor. This is because if we intend to use the classification accuracy as a criterion for evaluating a PR system, the points within the overlapping volume tend to lead to maximal misclassification. Unfortunately, the computation of the indices which quantify the overlapping volume is expensive. In this vein, we propose a strategy of using a prototype reduction scheme (PRS) to approximately, but quickly, compute the latter. In this paper, we demonstrate, first of all, that this is an extremely expedient proposition. Indeed, we show that by completely discarding (we are not aware of any reported scheme which discards ''irrelevant'' sample (training) points, and which simultaneously attains to an almost-comparable accuracy) the points not included by the PRS, we can obtain a reduced set of sample points, using which, in turn, the measures for the overlapping volume can be computed. The value of the corresponding figures is comparable to those obtained with the original training set (i.e., the one which considers all the data points) even though the computations required to obtain the prototypes and the corresponding measures are significantly less. The proposed method has been rigorously tested on artificial and real-life datasets, and the results obtained are, in our opinion, quite impressive-sometimes faster by two orders of magnitude.