C4.5: programs for machine learning
C4.5: programs for machine learning
Computational learning theory and natural learning systems: Volume IV
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
Data Mining and Knowledge Discovery
Discretization: An Enabling Technique
Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
WSEAS Transactions on Information Science and Applications
Application of decision trees in problem of air quality modelling in the Czech Republic locality
WSEAS TRANSACTIONS on SYSTEMS
Implementation of classifiers for choosing insurance policy using decision trees: a case study
WSEAS Transactions on Computers
Occam's Razor and a non-syntactic measure of decision tree complexity
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Hi-index | 0.00 |
Because the target domain of data mining using decision trees usually contains a lot of data, sampling is needed. But selecting proper samples for a given decision tree algorithm is not easy, because each decision tree algorithm has its own property in generating trees and selecting appropriate samples that represent given target data set well is difficult. As the size of samples grows, the size of generated decision trees grows with some improvement in error rates. But we cannot use larger and larger samples, because it's not easy to understand large decision trees and data overfitting problem can happen. This paper suggests a progressive approach in determining a proper sample size to generate good decision trees with respect to generated tree size and accuracy. Experiments with two representative decision tree algorithms, CART and C4.5 show very promising results.