An effective sampling method for decision trees considering comprehensibility and accuracy

  • Authors:
  • Hyontai Sug

  • Affiliations:
  • Division of Computer and Information Engineering, Dongseo University, Busan, Republic of Korea

  • Venue:
  • WSEAS Transactions on Computers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because the target domain of data mining using decision trees usually contains a lot of data, sampling is needed. But selecting proper samples for a given decision tree algorithm is not easy, because each decision tree algorithm has its own property in generating trees and selecting appropriate samples that represent given target data set well is difficult. As the size of samples grows, the size of generated decision trees grows with some improvement in error rates. But we cannot use larger and larger samples, because it's not easy to understand large decision trees and data overfitting problem can happen. This paper suggests a progressive approach in determining a proper sample size to generate good decision trees with respect to generated tree size and accuracy. Experiments with two representative decision tree algorithms, CART and C4.5 show very promising results.