The Effect of Instance-Space Partition on Significance

  • Authors:
  • Jeffrey P. Bradford;Carla E. Brodley

  • Affiliations:
  • School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. jeffrey.bradford@computer.org;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. brodley@ecn.purdue.edu

  • Venue:
  • Machine Learning
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induction algorithms, we find situations in which one algorithm is judged more accurate at the p = 0.05 level with one partition of the training instances but the other algorithm is judged more accurate at the p = 0.05 level with an alternate partition. We recommend a new significance procedure that involves performing cross-validation using multiple instance-space partitions. Significance is determined by applying the paired Student t-test separately to the results from each cross-validation partition, averaging their values, and converting this averaged value into a significance value.