An empirical determination of samples for decision trees

Authors:
Hyontai Sug
Affiliations:
Division of Computer and Information Engineering, Dongseo University, Busan, Republic of Korea
Venue:
AIKED'09 Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
Year:
2009

Citing 9
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Exploring the decision forest: an empirical investigation of Occam's razor in decision tree induction

Computational learning theory and natural learning systems: Volume IV
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

Data Mining and Knowledge Discovery
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Occam's Razor and a non-syntactic measure of decision tree complexity

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because it is not known to determine a?proper sample size for data mining tasks, the task of determining proper sample sizes for decision trees that are one of the best data mining algorithms is arbitrary, and as the size of samples grows, the size of generated decision trees grows with some improvement in error rates. But we cannot use larger and larger samples, because it's not easy to understand large decision trees and data overfitting problem can happen with limited target data set. This paper suggests an objective approach in determining a proper sample size to generate good decision trees with respect to generated tree size and error rates. Experiments with two representative decision tree algorithms, CART and C4.5 show very promising results.