Data mining tasks and methods: Classification: decision-tree discovery

  • Authors:
  • Ronny Kohavi;J. Ross Quinlan

  • Affiliations:
  • Senior Director of Data Mining Applications, Blue Martini Software, San Mateo, California;Executive Director, RuleQuest Research Party Limited, Sydney, Australia

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

We describe the two most commonly used systems for induction of decision trees for classification: C4.5 and CART. We highlight the methods and different decisions made in each system with respect to splitting criteria, pruning, noise handling, and other differentiating features. We describe how rules can be derived from decision trees and point to some differences in the induction of regression trees. We conclude with some pointers to advanced techniques, including ensemble methods, oblique splits, grafting, and coping with large data sets.