An empirical determination of samples for decision trees

  • Authors:
  • Hyontai Sug

  • Affiliations:
  • Division of Computer and Information Engineering, Dongseo University, Busan, Republic of Korea

  • Venue:
  • AIKED'09 Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because it is not known to determine a?proper sample size for data mining tasks, the task of determining proper sample sizes for decision trees that are one of the best data mining algorithms is arbitrary, and as the size of samples grows, the size of generated decision trees grows with some improvement in error rates. But we cannot use larger and larger samples, because it's not easy to understand large decision trees and data overfitting problem can happen with limited target data set. This paper suggests an objective approach in determining a proper sample size to generate good decision trees with respect to generated tree size and error rates. Experiments with two representative decision tree algorithms, CART and C4.5 show very promising results.