C4.5: programs for machine learning
C4.5: programs for machine learning
Computational learning theory and natural learning systems: Volume IV
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
Data Mining and Knowledge Discovery
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Occam's Razor and a non-syntactic measure of decision tree complexity
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Hi-index | 0.00 |
Because it is not known to determine a?proper sample size for data mining tasks, the task of determining proper sample sizes for decision trees that are one of the best data mining algorithms is arbitrary, and as the size of samples grows, the size of generated decision trees grows with some improvement in error rates. But we cannot use larger and larger samples, because it's not easy to understand large decision trees and data overfitting problem can happen with limited target data set. This paper suggests an objective approach in determining a proper sample size to generate good decision trees with respect to generated tree size and error rates. Experiments with two representative decision tree algorithms, CART and C4.5 show very promising results.