Multiple Comparisons in Induction Algorithms

  • Authors:
  • David D. Jensen;Paul R. Cohen

  • Affiliations:
  • Experimental Knowledge Systems Laboratory, Department of Computer Science, University of Massachusetts, Amherst, MA 01003-4610 USA. jensen@cs.umass.edu;Experimental Knowledge Systems Laboratory, Department of Computer Science, University of Massachusetts, Amherst, MA 01003-4610 USA. cohen@cs.umass.edu

  • Venue:
  • Machine Learning
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A single mechanism is responsible for three pathologies ofinduction algorithms: attribute selection errors, overfitting, andoversearching. In each pathology, induction algorithms comparemultiple items based on scores from an evaluation function andselect the item with the maximum score. We call this amultiple comparison procedure (MCP). We analyze thestatistical properties of MCPs and show how failure to adjustfor these properties leads to the pathologies. We also discussapproaches that can control pathological behavior, includingBonferroni adjustment, randomization testing, andcross-validation.