Implementation and Evaluation of Decision Trees with Rangeand Region Splitting

  • Authors:
  • Yasuhiko Morimoto;Takeshi Fukuda;Shinichi Morishita;Takeshi Tokuyama

  • Affiliations:
  • -;-;-;IBM Tokyo Research Laboratory, 1623-14, Shimo-tsuruma, Yamato City, Kanagawa Pref, 242, JAPAN

  • Venue:
  • Constraints
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an extension of an entropy-based heuristicfor constructing a decision tree from a large database with manynumeric attributes. When it comes to handling numeric attributes,conventional methods are inefficient if any numeric attributesare strongly correlated. Our approach offers one solution tothis problem. For each pair of numeric attributes with strongcorrelation, we compute a two-dimensional association rule withrespect to these attributes and the objective attribute of thedecision tree. In particular, we consider a family {\calR} of grid-regions in the plane associated with the pairof attributes. For R \in {\cal R}, the data canbe split into two classes: data inside R and dataoutside R. We compute the region R_{opt}\in {\cal R} that minimizes the entropy of the splitting,and add the splitting associated with R_{opt} (foreach pair of strongly correlated attributes) to the set of candidatetests in an entropy-based heuristic. We give efficient algorithmsfor cases in which {\cal R} is (1) x-monotone connectedregions, (2) based-monotone regions, (3) rectangles, and (4)rectilinear convex regions. The algorithm has been implementedas a subsystem of SONAR (System for Optimized Numeric AssociationRules) developed by the authors. We have confirmed that we cancompute the optimal region efficiently. And diverse experimentsshow that our approach can create compact trees whose accuracyis comparable with or better than that of conventional trees.More importantly, we can grasp non-linear correlation among numericattributes which could not be found without our region splitting.