The attribute selection problem in decision tree generation

  • Authors:
  • Usama M. Fayyad;Keki B. Irani

  • Affiliations:
  • Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA;AI Laboratory, E.E.C.S. Department, The University of Michigan, Ann Arbor, MI

  • Venue:
  • AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of selecting an attribute and some of its values for branching during the top-down generation of decision trees. We study the class of impurity measures, members of which are typically used in the literature for selecting attributes during decision tree generation (e.g. entropy in ID3, GID3*, and CART; Gini Index in CART). We argue that this class of measures is not particularly suitable for use in classification learning. We define a new class of measures, called C-SEP, that we argue is better suited for the purposes of class separation. A new measure from C-SEP is formulated and some of its desirable properties are shown. Finally, we demonstrate empirically that the new algorithm, O-BTree, that uses this measure indeed produces better decision trees than algorithms that use impurity measures.