Proper Model Selection with Significance Test

  • Authors:
  • Jin Huang;Charles X. Ling;Harry Zhang;Stan Matwin

  • Affiliations:
  • School of Information Tech. and Eng., University of Ottawa, Canada;Department of Computer Science, The University of Western Ontario, Canada;Faculty of Computer Science, University of New Brunswick, Canada;School of Information Tech. and Eng., University of Ottawa, Canada

  • Venue:
  • ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Model selection is an important and ubiquitous task in machine learning. To select models with the best future classification performance measured by a goal metric, an evaluation metricis often used to select the best classification model among the competing ones. A common practice is to use the same goal and evaluation metric. However, in several recent studies, it is claimed that using an evaluation metric (such as AUC) other than the goal metric (such as accuracy) results in better selection of the correct models. In this paper, we point out a flaw in the experimental design of those studies, and propose an improved method to test the claim. Our extensive experiments show convincingly that only the goal metric itself can most reliably select the correct classification models.