Choosing data-mining methods for multiple classification: representational and performance measurement implications for decision support

  • Authors:
  • William E. Spangler;Jerrold H. May;Luis G. Vargas

  • Affiliations:
  • -;-;-

  • Venue:
  • Journal of Management Information Systems - Special section: Data mining
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data-mining techniques are designed for classification problems in which each observation is a member of one and only one category. We formulate ten data representations that could be used to extend those methods to problems in which observations may be full members of multiple categories. We propose an audit matrix methodology for evaluating the performance of three popular data-mining techniques--linear discriminant analysis, neural networks, and decision tree induction-- using the representations that each technique can accommodate. We then empirically test our approach on an actual surgical data set. Tree induction gives the lowest rate of false positive predictions, and a version of discriminant analysis yields the lowest rate of false negatives for multiple category problems, but neural networks give the best overall results for the largest multiple classification cases. There is substantial room for improvement in overall performance for all techniques.